Case Study: Building Scalable Cloud Infrastructure for Aperture Space’s ML Model
Client Overview
Client |
Aperture Space |
Industry |
Environmental Analytics, Geospatial Data |
Objective: Deploy the ML model Docker container on Google Cloud, create scalable and cost-efficient infrastructure, build API layer, and manage demand fluctuations.
Project Background
Aperture Space aimed to create a scalable solution for the their satellite vision ML model. They needed to migrate their local Docker container to Google Cloud for remote operations and build a system that adjusts based on demand.
Results & Impact
- Scalability: Aperture Space’s ML model now runs in a fully scalable environment, handling demand surges with 95% less downtime thanks to automated scaling.
- Efficiency: Processing times dropped 80% with parallel processing, while cutting cloud costs by 60%, saving up to $25,000 annually.
- Automation: Deployment times shrank by 90%, enabling seamless updates in under 2 minutes.
By the Numbers
- 95% downtime reduction during peak use.
- 80% faster processing.
- Up to $25,000/year savings.
Key Deliverables and Challenges
- Cloud Infrastructure Setup: Migrate the Docker container to Google Cloud and configure it for a scalable, production-ready environment.
- API Development & Deployment: Wrap the model in an API, deploy on a VM, and integrate with Google Cloud services.
- Scalability & Automation: Design a system that scales with demand, using load balancers and a bulk run module.
- Error Handling & Monitoring: Implement logging, error handling, and cloud monitoring for robust performance.
Solution Overview
The solution was divided into phases, each targeting cloud migration, API development, scalability, and monitoring.
Phase 1: Project Setup & Cloud Resource Provisioning
- Objective: Set up the foundation for cloud migration and establish a task plan.
- Key Actions:
- Reviewed Docker container to identify dependencies and plan API tasks.
- Provisioned cloud resources using Google Cloud Run, Google Cloud Storage (GCS), and an Open Relational Database.
- Conducted basic testing to validate the initial cloud setup.
Phase 2: API Wrapper Development & Deployment
- Objective: Build an async API to interact with the model.
- Key Actions:
- Developed an API wrapper outside the Docker container for easier development.
- Develop user authentication layer with hashed API keys
- Enable asynchronous model runs with background jobs triggering the model as a Cloud Run job, and a separate fetch API to get results and status
- Deployed the API on a Google Cloud VM, ensuring smooth interaction with GCS.
- Automated deployment with CircleCI pipelines linked to GitHub
Phase 3: Cloud Relational Database Integration
- Objective: Implement cloud-based storage and database solutions.
- Key Actions:
- Integrated Google Cloud SQL with the cloud infrastructure.
- Updated data storage code for seamless interaction between the API, database, and storage.
- Developed a queueing system for jobs to handle peak throughput.
Phase 4: Scalability Implementation
- Objective: Design the system to scale and enable bulk jobs.
- Key Actions:
- Developed a bulk run module for batch processing of input CSV files.
- Integrated output handling to GCS for efficient result dumping.
- Configured Google Cloud Load Balancer to distribute workload and scale based on demand.
Phase 5: Error Handling & Monitoring
- Objective: Implement robust logging and monitoring.
- Key Actions:
- Set up error handling within the API.
- Configured multi-level logging using Google Cloud Logging and Google Cloud Monitoring.
Conclusion
The migration of Aperture Space's model to Google Cloud enabled a scalable, efficient, and robust system. The fully integrated cloud solution, including API development, database integration, and automated scaling, positioned the client to meet growing demand for their analytics service.