HR Tech Platform: Jenkins Migration, 3.5-Hour CI to 18 Minutes
A workforce management SaaS had been running Jenkins on a dedicated EC2 instance since 2019. Build times had grown to 3.5 hours for the main application. Engineers had stopped running the full suite locally. We migrated to GitHub Actions with parallelised test execution and cut CI time by 91%.
The Challenge
Jenkins had accumulated 5 years of plugins, Groovy pipelines nobody fully understood, and a habit of failing for reasons unrelated to the code being tested. The Jenkins VM had been upscaled three times - it was running on an m5.2xlarge at $280/month. Build queues regularly backed up to 45 minutes. Engineers had started merging without waiting for CI to pass, which caused a downstream flood of broken main branches.
The Approach
We audited the existing Jenkins pipeline to understand what it actually did (not what the docs said it did), rewrote it as GitHub Actions workflows, introduced parallelised test sharding to eliminate the primary bottleneck, and decommissioned the Jenkins EC2 instance. Timeline: 3 weeks.
The Implementation
Jenkins audit and test suite analysis
The 3.5-hour build was dominated by a sequential test suite of 1,847 RSpec tests running on a single thread. We profiled the suite and found 340 tests accounting for 80% of the runtime - most of them integration tests hitting a real PostgreSQL database. Parallelisation was the only viable fix.
Parallelised test sharding with GitHub Actions matrix
We split the test suite across 8 parallel runners using the RSpec split gem and a GitHub Actions matrix strategy. Each runner provisions its own PostgreSQL service container and runs 1/8 of the suite. Total parallel wall-clock time: 11 minutes. Full pipeline with build and deploy: 18 minutes.
Dependency caching and Docker layer optimisation
We added gem bundle caching keyed on Gemfile.lock (reducing install time from 4.5 minutes to 35 seconds), Docker layer caching via GitHub Actions cache, and a multi-stage Dockerfile that separates the gem install layer from the application code layer.
Branch protection and required status checks
We enforced branch protection on main requiring all CI checks to pass before merge. Engineers could no longer merge broken code. We added a Slack notification on main branch failure with a direct link to the failing job. Within one week, the team stopped treating CI as optional.
Key Takeaways
- Test parallelisation is the highest-ROI CI optimisation for any suite over 500 tests - the wall-clock reduction is multiplicative with the shard count
- Jenkins maintenance overhead is invisible until you remove it - teams routinely underestimate the hours spent on plugin updates and flaky agent issues
- Branch protection is the policy change that makes the technical investment pay off - fast CI only matters if engineers have to wait for it
- Profiling the test suite before optimising it is essential - 80% of runtime in 20% of tests is the normal distribution
Facing Similar Challenges?
Book a free 30-minute audit and I will tell you what I see.