Blue-Green Deployment on Kubernetes: Zero Downtime in Production

Blue-green deployment on Kubernetes guarantees zero-downtime deployment by maintaining two identical production environments. For any infrastructure engineer pursuing CKAD certification, this strategy represents the safest method to instantly switch to a new version. With 82% of container users in production on Kubernetes, mastering blue-green becomes essential.

TL;DR: Blue-green maintains two complete environments (blue = current, green = new). The switch is done by modifying the Service selector. Instant rollback if problems occur.

This skill is at the core of the LFS458 Kubernetes Administration training.

What is blue-green deployment for infrastructure engineers pursuing CKAD certification?

Blue-green deployment is a deployment strategy that uses two identical production environments called "blue" and "green". At any time, only one environment receives production traffic while the other serves as staging for the new version.

This technique offers:

Zero downtime: the switch is instantaneous
Instant rollback: return to old version in one command
Complete validation: test the new version before exposing to real traffic

Key takeaway: Blue-green deployment doubles required resources but eliminates any downtime risk.

The Kubernetes Deployment and Production guide covers the fundamentals to master before implementing this strategy.

How does blue-green deployment work on Kubernetes?

Basic architecture

┌─────────────────────────────────────────────────┐
│                    Service                       │
│              selector: version=blue              │
└──────────────────────┬──────────────────────────┘
│
┌─────────────┴─────────────┐
▼                           ▼
┌─────────────────┐         ┌─────────────────┐
│   Deployment    │         │   Deployment    │
│     BLUE        │         │     GREEN       │
│   version=blue  │         │  version=green  │
│    (active)     │         │   (standby)     │
└─────────────────┘         └─────────────────┘

Phase 1: Initial deployment (Blue)

# deployment-blue.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-blue
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: blue
template:
metadata:
labels:
app: myapp
version: blue
spec:
containers:
- name: myapp
image: myapp:1.0.0
ports:
- containerPort: 8080
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5

Service pointing to Blue

# service.yaml
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp
version: blue  # Points to blue
ports:
- port: 80
targetPort: 8080

Phase 2: Green deployment

Deploy the new version without impacting traffic:

# deployment-green.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-green
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: green
template:
metadata:
labels:
app: myapp
version: green
spec:
containers:
- name: myapp
image: myapp:2.0.0
ports:
- containerPort: 8080
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5

Key takeaway: The green deployment is completely isolated from production traffic.

How does an infrastructure engineer with LFS458 training validate green before the switch?

Internal test Service

Create a dedicated Service to test green:

# service-green-test.yaml
apiVersion: v1
kind: Service
metadata:
name: myapp-green-test
spec:
selector:
app: myapp
version: green
ports:
- port: 80
targetPort: 8080

Automated tests

#!/bin/bash
GREEN_SERVICE="myapp-green-test"

echo "=== Green Environment Validation ==="

# Health check
curl -f http://${GREEN_SERVICE}/health || exit 1

# Smoke tests
curl -f http://${GREEN_SERVICE}/api/v1/status || exit 1

# Performance baseline
ab -n 1000 -c 10 http://${GREEN_SERVICE}/ > perf-results.txt

# Comparison with blue
if [ $(grep "Requests per second" perf-results.txt | awk '{print $4}') -lt 100 ]; then
echo "Performance degradation detected"
exit 1
fi

echo "Green validation passed"

Pod verification

# All green pods must be Ready
kubectl get pods -l app=myapp,version=green

# No recent restarts
kubectl get pods -l app=myapp,version=green -o jsonpath='{.items[*].status.containerStatuses[*].restartCount}'

The Kubernetes Autoscaling guide details how to configure scaling before the switch.

How to perform the blue to green switch?

kubectl patch method

# Switch to green
kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}'

# Verification
kubectl get service myapp -o jsonpath='{.spec.selector.version}'
# Output: green

Secure switch script

#!/bin/bash
set -e

SERVICE="myapp"
NEW_VERSION="green"
OLD_VERSION="blue"

echo "Switching from ${OLD_VERSION} to ${NEW_VERSION}..."

# Verify green is ready
READY_PODS=$(kubectl get pods -l app=myapp,version=${NEW_VERSION} -o jsonpath='{.items[*].status.conditions[?(@.type=="Ready")].status}' | tr ' ' '\n' | grep -c True)
TOTAL_PODS=$(kubectl get pods -l app=myapp,version=${NEW_VERSION} --no-headers | wc -l)

if [ "$READY_PODS" -ne "$TOTAL_PODS" ]; then
echo "Error: Not all green pods are ready (${READY_PODS}/${TOTAL_PODS})"
exit 1
fi

# Perform the switch
kubectl patch service ${SERVICE} -p "{\"spec\":{\"selector\":{\"version\":\"${NEW_VERSION}\"}}}"

# Verify the switch
sleep 2
CURRENT=$(kubectl get service ${SERVICE} -o jsonpath='{.spec.selector.version}')

if [ "$CURRENT" = "$NEW_VERSION" ]; then
echo "Switch successful: now routing to ${NEW_VERSION}"
else
echo "Switch failed: still routing to ${CURRENT}"
exit 1
fi

Post-switch validation

# Verify traffic reaches green
for i in {1..10}; do
curl -s http://myapp/version
sleep 1
done

# Check metrics
kubectl top pods -l app=myapp,version=green

Key takeaway: The switch only modifies the Service selector. No pod is recreated.

How to perform instant rollback?

One-command rollback

# Return to blue
kubectl patch service myapp -p '{"spec":{"selector":{"version":"blue"}}}'

Automated rollback script

#!/bin/bash
SERVICE="myapp"

# Determine current version
CURRENT=$(kubectl get service ${SERVICE} -o jsonpath='{.spec.selector.version}')

if [ "$CURRENT" = "green" ]; then
ROLLBACK_TO="blue"
else
ROLLBACK_TO="green"
fi

echo "Rolling back from ${CURRENT} to ${ROLLBACK_TO}..."

kubectl patch service ${SERVICE} -p "{\"spec\":{\"selector\":{\"version\":\"${ROLLBACK_TO}\"}}}"

echo "Rollback complete"

Automatic rollback conditions

Integrate monitoring that triggers rollback:

# prometheus-rule.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: blue-green-rollback
spec:
groups:
- name: deployment
rules:
- alert: HighErrorRateAfterSwitch
expr: |
sum(rate(http_requests_total{status=~"5.*",version="green"}[5m]))
/
sum(rate(http_requests_total{version="green"}[5m])) > 0.05
for: 2m
labels:
severity: critical
annotations:
summary: "Error rate > 5% on green deployment"

The Kubernetes Scaling Problems guide helps diagnose post-switch issues.

Blue-green comparison vs other strategies

Criteria	Blue-Green	Rolling Update	Canary
Downtime	Zero	Minimal	Zero
Resources	2x	1x + buffer	1x + 10-20%
Rollback	Instant	Progressive	Instant
Pre-prod validation	Complete	Limited	Partial
Complexity	Medium	Low	High
Exposure risk	None before switch	Progressive	Controlled (%)

Consult the Rolling Update Kubernetes comparison for detailed analysis.

Automate blue-green with Argo Rollouts

Installation

kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

Rollout configuration

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: myapp
spec:
replicas: 3
strategy:
blueGreen:
activeService: myapp-active
previewService: myapp-preview
autoPromotionEnabled: false
scaleDownDelaySeconds: 30
prePromotionAnalysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: myapp-preview
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:1.0.0
ports:
- containerPort: 8080

Associated services

# Active service (production)
apiVersion: v1
kind: Service
metadata:
name: myapp-active
spec:
selector:
app: myapp
ports:
- port: 80
targetPort: 8080
---
# Preview service (test)
apiVersion: v1
kind: Service
metadata:
name: myapp-preview
spec:
selector:
app: myapp
ports:
- port: 80
targetPort: 8080

Manual promotion

# View status
kubectl argo rollouts get rollout myapp

# Promote green
kubectl argo rollouts promote myapp

# Abort and rollback
kubectl argo rollouts abort myapp

Key takeaway: Argo Rollouts automatically manages both deployments and the Service switch.

The CI/CD Pipeline for Kubernetes guide shows Argo Rollouts integration.

Best practices for zero-downtime Kubernetes deployment

1. Database and migrations

Blue-green requires database compatibility between versions:

Version N (blue) ─────┐
├──► Database (compatible N and N+1)
Version N+1 (green) ──┘

Recommended strategy:

Backward-compatible schema migration
Green deployment with new code
Switch to green
Cleanup of old columns/tables

2. User sessions

Externalize sessions:

# Redis for sessions
env:
- name: SESSION_STORE
value: redis://redis-cluster:6379

3. Feature flags

Combine blue-green and feature flags for maximum control:

env:
- name: FEATURE_NEW_CHECKOUT
value: "false"  # Enabled after validation

4. Differentiated monitoring

Label metrics by version:

metadata:
labels:
app: myapp
version: green
deployment-type: blue-green

According to the CNCF 2025 report, 104,000 people have taken the CKA exam with 49% annual growth. Blue-green deployment is part of the expected skills.

When to use blue-green deployment?

Ideal use cases

Stateless applications: APIs, stateless microservices
Major changes: refactoring, framework migration
Strict SLA requirements: contractual zero downtime
Complete validation required: end-to-end tests before production

Cases to avoid

Limited resources: doubling resources is prohibitive
Complex stateful workloads: databases, file systems
Frequent deployments: maintenance cost becomes high

The Kubernetes Tutorials and Practical Guides offer exercises on each strategy.

Blue-green cost management

Asymmetric scaling

Reduce standby costs:

# Green at 1 replica during tests
kubectl scale deployment myapp-green --replicas=1

# Scale up before switch
kubectl scale deployment myapp-green --replicas=3

# Scale down blue after validation
kubectl scale deployment myapp-blue --replicas=0

Automated cleanup

# Delete old deployment after 24h of stability
kubectl delete deployment myapp-blue

The Deploy with Helm Charts guide shows how to package this logic.

Train on zero-downtime Kubernetes deployment

The CKAD exam tests these skills with a 66% pass rate over 2 hours. The LFD459 Kubernetes for Developers training of 3 days prepares you for it. For a more operational approach, the LFS458 Kubernetes Administration training of 4 days covers advanced deployment strategies.

Check upcoming sessions and start your certification path.

Key Takeaways