Healthtech: Monolith to Microservices Without Stopping Delivery
A 250K-patient digital health platform was operating a 6-year-old Rails monolith. Feature velocity had dropped 70% in 18 months as the codebase grew too large for the team to modify safely. We extracted four critical services over 12 weeks while the product team continued shipping.
The Challenge
The monolith had 380K lines of Rails code, a 600-table PostgreSQL schema, and no service boundaries. Three teams of 8 engineers were constantly stepping on each other's migrations and blocking deploys. The CEO had tried a two-year full rewrite once - it failed. The constraint was non-negotiable: product development could not stop during the migration.
The Approach
We used the Strangler Fig pattern: extract services at the edges of the monolith, not the middle. We identified the four services with the most coupling pain and the clearest data ownership boundaries: notifications, appointment scheduling, billing, and document storage. Each extraction was a separate 3-week engagement.
The Implementation
Event-driven extraction with Kafka
We introduced Kafka as the communication layer between the monolith and extracted services. The monolith publishes events (AppointmentCreated, BillingInvoiceGenerated) and the new services consume them. This allowed the monolith and services to evolve independently without synchronous coupling.
Database per service with sync period
Each new service got its own PostgreSQL database. During a 2-week sync period, writes went to both the monolith and the new service database. After validation, the monolith stopped writing to those tables and the new service became the system of record.
API gateway for gradual traffic shift
We deployed Kong as an API gateway in front of both the monolith and new services. Traffic routing rules allowed us to shift 5% → 25% → 100% of requests to new services per endpoint, with instant rollback capability if error rates spiked.
PHI boundary enforcement
Extracting services in a HIPAA environment required ensuring PHI never crossed service boundaries unencrypted and that each service maintained its own audit log. We added field-level encryption at the Kafka producer and decryption at the consumer, with per-service CloudTrail log groups.
Key Takeaways
- Strangler Fig works - extract at the edges with clear data ownership, never start in the middle of the monolith
- Kafka as the communication layer allows the monolith and services to evolve at different speeds without synchronous coupling
- A 2-week dual-write sync period before cutting over is the safety net that makes database extraction non-scary
- Never attempt a full rewrite - incremental extraction with a working monolith in production is always safer and faster
Facing Similar Challenges?
Book a free 30-minute audit and I will tell you what I see.