Skip to content
E-commerce2025-07

RetailTech Platform: $22K AWS Bill to $10.8K in 6 Weeks

A retail analytics SaaS company had watched their AWS bill grow from $8K to $22K/month over two years without a corresponding growth in customers or revenue. A structured cost audit identified $11K/month in waste across five categories. The fix took 6 weeks and required no application code changes.

Deploy Time
N/A
N/A
Deploy Frequency
N/A
N/A
Incidents
$22,000/month AWS spend (and growing)
$10,800/month AWS spend
Cost Impact
-
$11,200/month ($134K/year)

The Challenge

The engineering team had optimised for velocity and never looked at the bill systematically. The result: a data pipeline running 24/7 on on-demand r5.4xlarge instances processing 4-hour batches, ElastiCache clusters sized for the previous year's peak traffic, RDS instances with zero Savings Plans, and a log aggregation setup shipping 2TB/day to CloudWatch Logs at $0.50/GB ingestion.

The Approach

We ran a 3-day cost discovery sprint: pull the AWS Cost and Usage Report, tag every resource with service and team, identify top-10 cost drivers, then model the savings from each optimisation. We presented a prioritised fix list sorted by impact-to-effort ratio and implemented in that order.

The Implementation

Data pipeline: on-demand to Spot with EMR

The Spark data pipeline ran on 8× r5.4xlarge On-Demand instances for 4 hours every day. We migrated it to EMR on EC2 with a mixed Spot/On-Demand fleet (80/20), provisioned only during processing windows via a scheduled EventBridge trigger. The pipeline now runs the same job at 73% lower cost.

AWS EMRSpot InstancesEventBridgeApache Spark

ElastiCache right-sizing and reserved nodes

Three ElastiCache clusters were sized at the previous year's peak with no autoscaling. We right-sized two from cache.r6g.2xlarge to cache.r6g.large (current P95 memory usage was 22% of capacity), and purchased 1-year Reserved Nodes for all three. Total ElastiCache savings: $3,200/month.

AWS ElastiCacheCloudWatchReserved Nodes

CloudWatch Logs to S3 + Athena

Application logs were being shipped directly to CloudWatch at $0.50/GB ingestion. At 2TB/day that was $30,000/month - more than the rest of the bill combined. We implemented a tiered logging strategy: errors and warnings to CloudWatch (50GB/day), full logs exported via Kinesis Firehose to S3 (queried via Athena when needed). Log ingestion cost dropped from $30K to $750/month.

Kinesis FirehoseAWS S3Amazon AthenaCloudWatch

RDS Savings Plans and idle instance decommission

Five RDS instances had no reserved capacity - all On-Demand. We purchased 1-year Reserved Instances for three production databases (34% discount) and identified two development databases that had not had a connection in 47 days. We created snapshots and terminated the instances.

AWS RDSReserved InstancesCloudWatch

Key Takeaways

  • CloudWatch Logs ingestion cost is invisible until you look - at $0.50/GB it is the single most common surprise in AWS cost audits
  • Spot instances for batch workloads are the highest-ROI change in most data infrastructure audits - 60–80% savings with zero application changes
  • Reserved Instances and Savings Plans should be reviewed every quarter - teams consistently leave this on the table for 12–18 months
  • Two idle RDS instances had been running for 47 days consuming $1,400/month - a weekly cost audit would have caught this in week two

Facing Similar Challenges?

Book a free 30-minute audit and I will tell you what I see.

Book Free Audit
All case studies