Case Study: Building Scalable Cloud Infrastructure for Aperture Space’s ML Model

Client Overview

Client Aperture Space
Industry Environmental Analytics, Geospatial Data

Objective: Deploy the ML model Docker container on Google Cloud, create scalable and cost-efficient infrastructure, build API layer, and manage demand fluctuations.

Project Background

Aperture Space aimed to create a scalable solution for the their satellite vision ML model. They needed to migrate their local Docker container to Google Cloud for remote operations and build a system that adjusts based on demand.

Results & Impact

  • Scalability: Aperture Space’s ML model now runs in a fully scalable environment, handling demand surges with 95% less downtime thanks to automated scaling.
  • Efficiency: Processing times dropped 80% with parallel processing, while cutting cloud costs by 60%, saving up to $25,000 annually.
  • Automation: Deployment times shrank by 90%, enabling seamless updates in under 2 minutes.

By the Numbers

  • 95% downtime reduction during peak use.
  • 80% faster processing.
  • Up to $25,000/year savings.

Key Deliverables and Challenges

  1. Cloud Infrastructure Setup: Migrate the Docker container to Google Cloud and configure it for a scalable, production-ready environment.
  2. API Development & Deployment: Wrap the model in an API, deploy on a VM, and integrate with Google Cloud services.
  3. Scalability & Automation: Design a system that scales with demand, using load balancers and a bulk run module.
  4. Error Handling & Monitoring: Implement logging, error handling, and cloud monitoring for robust performance.

Solution Overview

The solution was divided into phases, each targeting cloud migration, API development, scalability, and monitoring.

Phase 1: Project Setup & Cloud Resource Provisioning

  • Objective: Set up the foundation for cloud migration and establish a task plan.
  • Key Actions:
    • Reviewed Docker container to identify dependencies and plan API tasks.
    • Provisioned cloud resources using Google Cloud Run, Google Cloud Storage (GCS), and an Open Relational Database.
    • Conducted basic testing to validate the initial cloud setup.

Phase 2: API Wrapper Development & Deployment

  • Objective: Build an async API to interact with the model.
  • Key Actions:
    • Developed an API wrapper outside the Docker container for easier development.
    • Develop user authentication layer with hashed API keys
    • Enable asynchronous model runs with background jobs triggering the model as a Cloud Run job, and a separate fetch API to get results and status
    • Deployed the API on a Google Cloud VM, ensuring smooth interaction with GCS.
    • Automated deployment with CircleCI pipelines linked to GitHub

Phase 3: Cloud Relational Database Integration

  • Objective: Implement cloud-based storage and database solutions.
  • Key Actions:
    • Integrated Google Cloud SQL with the cloud infrastructure.
    • Updated data storage code for seamless interaction between the API, database, and storage.
    • Developed a queueing system for jobs to handle peak throughput.

Phase 4: Scalability Implementation

  • Objective: Design the system to scale and enable bulk jobs.
  • Key Actions:
    • Developed a bulk run module for batch processing of input CSV files.
    • Integrated output handling to GCS for efficient result dumping.
    • Configured Google Cloud Load Balancer to distribute workload and scale based on demand.

Phase 5: Error Handling & Monitoring

  • Objective: Implement robust logging and monitoring.
  • Key Actions:
    • Set up error handling within the API.
    • Configured multi-level logging using Google Cloud Logging and Google Cloud Monitoring.

Conclusion

The migration of Aperture Space's model to Google Cloud enabled a scalable, efficient, and robust system. The fully integrated cloud solution, including API development, database integration, and automated scaling, positioned the client to meet growing demand for their analytics service.