Key Takeaways
- ✓Canary deployment routes 5-10% of traffic to the new version before full rollout
- ✓Flagger automates analysis and rollback with configurable success thresholds (error rate < 1%, P99 latency)
The canary deployment Kubernetes strategy allows any Kubernetes infrastructure engineer to deploy new versions while limiting risk exposure. This Kubernetes progressive deployment technique routes a low percentage of traffic to the new version before a complete rollout. With 82% of container users running Kubernetes in production, mastering this strategy becomes essential.
TL;DR: Canary deployment progressively exposes 5-10% of traffic to a new version, validates metrics in real conditions, then gradually increases to 100% or automatic rollback.
To master these skills, discover the LFS458 Kubernetes Administration training.
What is canary deployment for Kubernetes infrastructure engineers?
Canary deployment is a deployment strategy that introduces a new application version to a subset of users before extending it to all traffic. The term comes from canaries used in mines to detect toxic gases.
This approach differs from standard rolling update by its granular traffic control. Where a rolling update progressively replaces all pods, canary maintains two versions simultaneously with configurable traffic distribution.
Key takeaway: Canary deployment validates a version in real production with minimal blast radius.
If you use Kubernetes, canary deployment represents an essential practice for minimizing deployment risks.
The Kubernetes deployment and production guide covers the fundamentals to know before implementing this strategy.
How does native Kubernetes progressive deployment work?
Basic architecture with two Deployments
The native method uses two distinct Deployments pointing to the same Service:
# deployment-stable.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-stable
spec:
replicas: 9
selector:
matchLabels:
app: myapp
version: stable
template:
metadata:
labels:
app: myapp
version: stable
spec:
containers:
- name: myapp
image: myapp:1.0.0
---
# deployment-canary.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-canary
spec:
replicas: 1
selector:
matchLabels:
app: myapp
version: canary
template:
metadata:
labels:
app: myapp
version: canary
spec:
containers:
- name: myapp
image: myapp:1.1.0
Unified Service
The Service selects pods from both Deployments via the common label:
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp # Selects both stable AND canary
ports:
- port: 80
targetPort: 8080
With 9 stable replicas and 1 canary replica, approximately 10% of traffic reaches the new version.
Native approach limitations
This method has constraints:
- Limited granularity: distribution depends on pod ratio
- No intelligent routing: impossible to target specific users
- Manual rollback: requires intervention to remove the canary
The Kubernetes deployment strategies compare this approach to alternatives.
How do Kubernetes infrastructure engineers implement progressive delivery with Istio?
Traffic splitting configuration
Istio enables precise routing control via VirtualService:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp
http:
- route:
- destination:
host: myapp
subset: stable
weight: 95
- destination:
host: myapp
subset: canary
weight: 5
DestinationRule for subsets
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: myapp
spec:
host: myapp
subsets:
- name: stable
labels:
version: stable
- name: canary
labels:
version: canary
Header-based routing
Target specific users for testing:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp
http:
- match:
- headers:
x-canary:
exact: "true"
route:
- destination:
host: myapp
subset: canary
- route:
- destination:
host: myapp
subset: stable
Key takeaway: Progressive delivery Kubernetes Istio offers granular control impossible with native Deployments.
The GitOps and Kubernetes guide explains how to automate these configurations.
Automate canary deployment with Flagger
Installation and configuration
Flagger automates the analysis and promotion of canary deployments:
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: myapp
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
service:
port: 80
analysis:
interval: 1m
threshold: 5
maxWeight: 50
stepWeight: 10
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
- name: request-duration
thresholdRange:
max: 500
interval: 1m
Automated workflow
Flagger executes this workflow:
- Detection: new image in the Deployment
- Creation: canary deployment with 0% traffic
- Analysis: progressive increment (10% → 20% → ... → 50%)
- Validation: metrics verification at each step
- Promotion: move to 100% if success, rollback if failure
Prometheus integration
Configure custom metrics:
metrics:
- name: error-rate
templateRef:
name: error-rate
namespace: flagger-system
thresholdRange:
max: 1
interval: 1m
The CI/CD pipeline for Kubernetes details Flagger integration in your deployment chain.
Canary comparison vs other deployment strategies
| Criteria | Canary | Rolling Update | Blue-Green |
|---|---|---|---|
| Exposure risk | Minimal (5-10%) | Progressive (per pod) | Total (50% min) |
| Rollback | Instant | Progressive | Instant |
| Required resources | +10-20% | 0% additional | +100% |
| Complexity | High | Low | Medium |
| Production validation | Yes | Limited | Yes (after switch) |
| Granular routing | Yes (with service mesh) | No | No |
Helm vs Kustomize compares tools for managing these configurations.
What metrics to monitor during a canary deployment?
Performance metrics
- P99 Latency: 99th percentile response time
- Throughput: requests per second processed
- Error rate: percentage of 5xx errors
Business metrics
- Conversion rate: impact on business KPIs
- Bounce rate: user abandonment rate
- Revenue per request: for e-commerce applications
Infrastructure metrics
- CPU/Memory utilization: resource consumption
- Pod restarts: container stability
- Network errors: connectivity issues
# Prometheus query for error rate by version
sum(rate(http_requests_total{status=~"5.*", app="myapp"}[5m])) by (version)
/
sum(rate(http_requests_total{app="myapp"}[5m])) by (version)
Key takeaway: Define your success criteria before deployment. A canary without metrics is a blind rollout.
The Kubernetes tutorials and practical guides offer hands-on monitoring exercises.
Best practices for successful Kubernetes canary deployment strategy
1. Start small
Start with 1-5% of traffic. Increase by maximum 10% increments.
2. Define clear failure criteria
Automate rollback if:
- Error rate > 1%
- P99 latency > 2x baseline
- Critical alerts triggered
3. Test in real conditions
The canary must receive representative traffic:
- Peak hours included
- All request types
- Sufficient duration (minimum 30 minutes)
4. Prepare the rollback
# Immediate rollback with kubectl
kubectl rollout undo deployment/myapp-canary
# With Flagger, simply remove the canary image
kubectl set image deployment/myapp myapp=myapp:1.0.0
Kubernetes autoscaling ensures your canary scales correctly under load.
Real use cases for Kubernetes progressive deployment
Database migration
Validate schema changes with a limited canary before complete migration.
Major refactoring
Test a service rewrite with a low percentage of real traffic.
Feature flags
Combine canary deployment and feature flags for maximum control.
According to the CNCF 2025 report, 104,000 people have taken the CKA exam with 49% annual growth. Mastery of canary deployment is part of the expected skills.
Take action with certifying training
The LFS458 Kubernetes Administration training covers advanced deployment strategies over 4 days. For developers, the LFD459 Kubernetes for Developers training of 3 days prepares for CKAD.
Check the upcoming sessions calendar and book your spot.