AI / ML2025-08

AI Startup: Production MLOps Platform in 6 Weeks

A computer vision startup had five models trained in Jupyter notebooks served by ad-hoc scripts on a single GPU instance. Models could not be versioned, retraining was manual, and two models had silently degraded for weeks. We built a production MLOps platform from scratch in 6 weeks.

Deploy Time

Manual - hours per model

Automated pipeline - 35 minutes end-to-end

Deploy Frequency

Ad-hoc

Weekly automated retraining per model

Incidents

2 silently degraded models discovered retroactively

Drift alerts within 24 hours of distribution shift

Cost Impact

$4K/month (GPU idle time eliminated)

The Challenge

The ML team was strong. The infrastructure was not. Models were trained locally, artifacts were copied to S3 with names like model_final_v2_REAL.pkl, served by a Flask app in a screen session on an EC2 instance, and retrained when someone remembered to. The engineering team wanted to scale to 20+ models - which was impossible with the current setup.

The Approach

We scoped the engagement around three missing capabilities: experiment tracking and model versioning, a repeatable training pipeline, and production monitoring with drift detection. We built the infrastructure layer around the existing ML work without redesigning any models.

The Implementation

MLflow experiment tracking and model registry

We deployed MLflow on Kubernetes with a PostgreSQL backend and S3 artifact store. The data science team instrumented their training scripts in two days. The model registry replaced the S3 bucket naming convention with a proper versioning and staging system.

MLflowPostgreSQLAWS S3Kubernetes

Kubeflow training pipelines

We built Kubeflow Pipelines for the three highest-frequency training jobs. Each pipeline pulls versioned training data, runs preprocessing as a containerised step, trains on a GPU node, evaluates against a validation set, and registers the model if it meets the quality threshold.

Kubeflow PipelinesAWS EKSGPU node poolMLflow

BentoML model serving

We replaced the Flask scripts with BentoML-packaged model servers deployed as Kubernetes Deployments. Each model runs as an independent service pulling the production-stage model from the MLflow registry on startup.

BentoMLKubernetesAWS ALBMLflow

Evidently drift monitoring

A daily job computes drift metrics between training distribution and live prediction inputs. Two models triggered alerts in the first week - both were retrained and redeployed within 48 hours.

Evidently AIApache AirflowSlackPrometheus

Key Takeaways

MLflow is the right first investment for any ML team - experiment tracking pays off immediately
Training-serving skew is the hardest bug to debug and the easiest to prevent
Drift monitoring found two underperforming models in the first week that the team had not noticed
BentoML versioning means deployments are reproducible - the exact model artifact and dependencies are versioned together

Facing Similar Challenges?

Book a free 30-minute audit and I will tell you what I see.

Book Free Audit

All case studies