MarTech2025-10

MarTech Platform: Docker Compose in Production to EKS in 4 Weeks

A marketing automation platform with 3,000 B2B customers was running its entire production stack on docker-compose on two EC2 instances. As they signed larger enterprise customers, the architecture became the primary obstacle to uptime guarantees and SOC2 discussions. We migrated to EKS in 4 weeks.

Deploy Time

docker-compose up on EC2 - 8 min, manual

GitHub Actions + ArgoCD - 5 min, automated

Deploy Frequency

2–3/week (manual, scary)

Daily

Incidents

No automatic recovery, AZ risk, no scaling

Multi-AZ, self-healing, autoscaling

Cost Impact

Enterprise SLA now achievable - $280K ARR contract signed

The Challenge

Docker Compose is not a production orchestrator. When a container crashed, nothing restarted it automatically. When one EC2 instance needed patching, the team had a choice: take downtime or skip the patch. There was no horizontal scaling - traffic spikes required a founder to SSH in and manually start more containers. The largest incoming enterprise client had a 99.9% uptime SLA requirement that the current setup physically could not meet.

The Approach

We translated the docker-compose.yml directly into Kubernetes manifests as the starting point, then layered in production requirements: health checks, HPA, pod anti-affinity across AZs, resource limits, and a proper CI/CD pipeline. The migration happened behind a feature flag in DNS - zero-downtime cutover.

The Implementation

docker-compose to Kubernetes manifest translation

We used Kompose to generate initial Kubernetes manifests from the docker-compose.yml, then manually corrected the output. The translation revealed three missing environment variables that existed in the CTO's muscle memory but nowhere in documentation. We converted those to Kubernetes Secrets backed by AWS Secrets Manager via ESO.

KomposeKubernetesExternal Secrets OperatorAWS Secrets Manager

EKS cluster with Karpenter autoscaling

We provisioned an EKS cluster via Terraform with Karpenter for node autoscaling. Two node pools: an On-Demand pool for the API and database proxy, and a Spot pool for the background job workers (email sending, data export, webhook delivery). Karpenter provisions new nodes in under 45 seconds.

AWS EKSKarpenterTerraform

Health checks and pod anti-affinity

Every workload got readiness and liveness probes. Pod anti-affinity rules spread API replicas across all three AZs - a single AZ failure now removes at most 33% of capacity instead of 50%. HPA was configured to scale on CPU and custom metrics (job queue depth for workers).

KubernetesHPAPrometheus Adapter

Zero-downtime DNS cutover

We ran EKS and the legacy EC2 setup in parallel for 72 hours, with 5% of traffic routed to EKS via weighted DNS. After validating error rates and latency, we shifted to 100% over 30 minutes. The EC2 instances stayed live for 48 hours as a rollback target, then were terminated.

AWS Route 53AWS ALBCloudWatch

Key Takeaways

Kompose is a useful translation starting point but the output always requires manual correction - treat it as a first draft
Pod anti-affinity across AZs is a single YAML block that eliminates the most common single point of failure
Running old and new infrastructure in parallel with weighted DNS is the safest cutover pattern for customer-facing services
Spot instances for background job workers are the right cost optimisation on day one - they can handle interruptions gracefully

Facing Similar Challenges?

Book a free 30-minute audit and I will tell you what I see.

Book Free Audit

All case studies