Zero-Downtime Deployments on Kubernetes: Rolling vs Blue-Green vs Canary

Deployment is the moment most engineering teams fear. Not because deployments are inherently dangerous - they are not - but because the team has not invested in making them safe. Three strategies exist to take that fear out of the equation. Here is how they work and when to use each.

Why Your Current Approach Probably Has Downtime

If you are using a basic kubectl apply or a simple rolling update with no configuration, you likely have subtle downtime:

•Pods accepting traffic while containers are still initializing
•Old pods terminating while requests are in-flight
•Health checks not configured, so bad deployments go undetected for minutes

All three strategies below assume you fix these first:

yaml
spec:
  containers:
    - name: api
      readinessProbe:
        httpGet:
          path: /health
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 5
        failureThreshold: 3
      livenessProbe:
        httpGet:
          path: /health
          port: 8080
        initialDelaySeconds: 15
        periodSeconds: 10
  terminationGracePeriodSeconds: 30

And in your Service/Ingress, ensure preStop gives in-flight requests time to drain:

yaml
lifecycle:
  preStop:
    exec:
      command: ["sh", "-c", "sleep 5"]

Now, the strategies.

Strategy 1: Rolling Update (Default)

Rolling updates replace pods one by one. Kubernetes waits for each new pod to pass its readiness probe before terminating an old one. The result: no downtime, no extra infrastructure.

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1          # max extra pods during rollout
      maxUnavailable: 0    # never drop below desired replica count
  template:
    # ...

With maxUnavailable: 0, Kubernetes will not remove an old pod until its replacement is healthy. With maxSurge: 1, only one extra pod runs at any time (controls cost).

When to use: Most teams, most of the time. Works for stateless APIs, background workers, anything that does not have complex state or backwards-incompatible API changes.

When it fails you: If you deploy a bad build, traffic hits the broken pods before you notice. With 4 replicas and maxSurge: 1, you will have 1 bad pod in the mix for several minutes while it rolls.

Strategy 2: Blue-Green Deployment

Blue-green runs two complete environments - blue (current) and green (new) - and switches traffic between them atomically using a Service selector swap.

Architecture:

•deployment-blue running current version
•deployment-green running new version
•One Service pointing to either blue or green via label selector

yaml
# Blue deployment - current production
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-blue
  labels:
    app: api
    slot: blue
spec:
  replicas: 4
  selector:
    matchLabels:
      app: api
      slot: blue
  template:
    metadata:
      labels:
        app: api
        slot: blue
    spec:
      containers:
        - name: api
          image: api:v1.5.2
---
# Green deployment - new version
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-green
  labels:
    app: api
    slot: green
spec:
  replicas: 4
  selector:
    matchLabels:
      app: api
      slot: green
  template:
    metadata:
      labels:
        app: api
        slot: green
    spec:
      containers:
        - name: api
          image: api:v1.6.0
---
# Service - currently pointing to blue
apiVersion: v1
kind: Service
metadata:
  name: api
spec:
  selector:
    app: api
    slot: blue        # change to "green" to switch traffic
  ports:
    - port: 80
      targetPort: 8080

To deploy: apply the green deployment, wait for all green pods to pass readiness, then patch the service:

bash
# Deploy new version to green
kubectl apply -f api-green.yaml
kubectl rollout status deployment/api-green

# Run smoke tests against green (using a test service pointing at green)
./smoke-tests.sh green

# Switch traffic - atomic, sub-second
kubectl patch service api -p '{"spec":{"selector":{"slot":"green"}}}'

# Rollback is instant - patch back to blue
# kubectl patch service api -p '{"spec":{"selector":{"slot":"blue"}}}'

When to use: When you need instant rollback, when you want to test the new version before sending real traffic, or when you have API changes that are risky.

The trade-off: Doubles your compute cost during the deployment window. For a large fleet, this can be significant. It also requires careful handling of database schema changes - both versions need to handle the same schema simultaneously during the switch.

Strategy 3: Canary Deployment

Canary routes a small percentage of production traffic to the new version, lets you observe it for a defined period, then promotes it to 100%.

The cleanest way to do this in Kubernetes is with a service mesh (Istio, Linkerd) or an ingress controller that supports traffic splitting (NGINX Ingress, Traefik, AWS ALB Ingress).

With NGINX Ingress and two Deployments:

yaml
# Stable deployment (95% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-stable
spec:
  replicas: 19
  template:
    metadata:
      labels:
        app: api
        track: stable
    spec:
      containers:
        - name: api
          image: api:v1.5.2
---
# Canary deployment (5% traffic via replica ratio)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-canary
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: api
        track: canary
    spec:
      containers:
        - name: api
          image: api:v1.6.0
---
# Single service selects both (no track label)
apiVersion: v1
kind: Service
metadata:
  name: api
spec:
  selector:
    app: api    # no track label - selects both deployments
  ports:
    - port: 80
      targetPort: 8080

With 19 stable + 1 canary replicas, ~5% of traffic hits the canary. Watch your metrics. If error rate stays flat, promote by updating the stable image and removing the canary:

bash
kubectl set image deployment/api-stable api=api:v1.6.0
kubectl rollout status deployment/api-stable
kubectl delete deployment/api-canary

For header-based canary routing with NGINX Ingress:

yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-canary
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"       # 10% of traffic
    # or route specific users:
    # nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
    # nginx.ingress.kubernetes.io/canary-by-header-value: "true"
spec:
  rules:
    - host: api.yourdomain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: api-canary-service
                port:
                  number: 80

When to use: When you want real production traffic on the new version before full rollout. Ideal for high-traffic services where even a 1% error rate means thousands of users affected. Good for teams with solid observability - you need to be able to measure error rate and latency on the canary separately.

The trade-off: More complex to set up and automate. You need to define promotion criteria and either automate the promotion decision or have someone watching dashboards.

Which One Should You Use?

	Rolling	Blue-Green	Canary
Infrastructure overhead	None	2× compute during deploy	Minimal
Rollback speed	Minutes	Seconds	Minutes
Detects bad deploys	After traffic hits all pods	Before traffic switch	On a small % of traffic
Complexity	Low	Medium	High
Good for	Most workloads	Risk-averse, critical APIs	High-traffic, well-monitored

Default recommendation: Start with Rolling. Configure maxUnavailable: 0, add proper readiness probes, and add preStop sleep. This covers 80% of use cases.

Upgrade to Blue-Green when you have had a bad deploy affect users and need instant rollback, or when your API is under high-stakes load (payments, auth).

Add Canary when your traffic volume is large enough that 1% of bad traffic is still meaningful, and when you have enough observability to make automated promotion decisions.

Not sure which strategy fits your system? Book a free audit - we will review your deployment setup and tell you what risk profile you are currently accepting.

Zero-Downtime Deployments on Kubernetes: Rolling vs Blue-Green vs Canary

Why Your Current Approach Probably Has Downtime

Strategy 1: Rolling Update (Default)

Strategy 2: Blue-Green Deployment

Strategy 3: Canary Deployment

Which One Should You Use?

Related Articles

Kubernetes vs ECS: Which Container Platform Does Your Startup Actually Need?

Docker Compose to Kubernetes: When to Migrate and When to Wait