Migration Feedback: Kubernetes Cluster to Production

This practical Kubernetes administration case study documents the production Kubernetes cluster migration experience of a CAC 40 company transitioning from Docker Swarm to Kubernetes. This transformation involved 150 microservices, 40 developers, and 8 months of intensive work. With 96% of organizations using or evaluating Kubernetes, this type of migration has become mandatory for IT leaders.

TL;DR: Docker Swarm to Kubernetes migration for a large enterprise: 150 microservices, 8 months, zero downtime. Keys to success: team training (LFS458), progressive migration, complete monitoring stack. Result: 40% reduction in incidents, 3x faster deployments.

To master these skills, discover the LFS458 Kubernetes Administration training.

What Was the Context for This Production Kubernetes Cluster Migration?

The company, a major player in the financial sector, had been operating its application infrastructure on Docker Swarm since 2019. Transaction volume growth and Swarm limitations motivated the migration.

"The VMware acquisition is influencing my decision making right now, heavily." - Enterprise CTO, Spectro Cloud State of Kubernetes 2025

This quote reflects the concerns of current IT decision-makers facing market changes.

Initial Situation

Metric	Docker Swarm (before)	Kubernetes Target
Microservices	150	150+
Deployments/day	5-10	30+
Incident MTTR	45 min	< 15 min
Availability	99.5%	99.9%
Rollback time	15 min	< 2 min

Kubernetes cluster administration covers the skills needed for such migrations.

Docker Swarm Limitations Identified

Docker Swarm, with 24% usage, presented several limitations:

Scaling: Swarm suits small workloads, Kubernetes scales to thousands of containers
Ecosystem: Limited tools compared to the CNCF ecosystem
Recruitment: Difficulty finding Swarm profiles, abundance of Kubernetes skills
Vendor support: Reduced Docker investment in Swarm

Key takeaway: Migration to Kubernetes is not purely technical. Ecosystem, recruitment, and long-term support factors weigh in the decision.

What Methodology Was Used for This Production Kubernetes Cluster Migration?

The infrastructure team structured the project in progressive phases, minimizing risks.

Phase 1: Training and Skills Development (Months 1-2)

Train teams before migrating. This phase included:

Infrastructure team (8 people) -> LFS458 Kubernetes Administration
Development team (32 people) -> LFD459 Kubernetes for Developers
Security team (4 people) -> LFS460 Kubernetes Security

"Anybody can learn Kubernetes. With abundant documentation and development tools available online, teaching yourself Kubernetes is very much within reach." -

However, structured training significantly accelerates skill development. The LFD459 Kubernetes system administrator training details this path.

Phase 2: Infrastructure and CI/CD (Months 2-4)

Deploy the target infrastructure in parallel with Swarm production:

# Provisioning HA kubeadm cluster
kubeadm init --control-plane-endpoint "k8s-api.internal:6443" \
--upload-certs \
--pod-network-cidr=10.244.0.0/16 \
--service-cidr=10.96.0.0/12

# Installing Calico CNI
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/calico.yaml

# Configuring NGINX Ingress
helm install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--set controller.replicaCount=3 \
--set controller.nodeSelector."kubernetes\.io/os"=linux

The Complete guide: installing a multi-node Kubernetes cluster with kubeadm details these procedures.

Key takeaway: Maintain both environments in parallel during migration. Only decommission Swarm after complete validation on Kubernetes.

Phase 3: Progressive Service Migration (Months 4-7)

Migration followed a wave-based approach:

Wave	Services	Criticality	Duration
1	Internal tools, monitoring	Low	2 weeks
2	Non-critical APIs	Medium	4 weeks
3	Secondary business services	High	4 weeks
4	Core banking, payments	Critical	6 weeks

# Example migrated manifest
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
namespace: production
spec:
replicas: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2
maxUnavailable: 0
selector:
matchLabels:
app: payment-service
template:
metadata:
labels:
app: payment-service
version: v2.3.1
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: payment-service
topologyKey: kubernetes.io/hostname
containers:
- name: payment
image: registry.internal/payment:v2.3.1
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "500m"

Phase 4: Validation and Cutover (Months 7-8)

The final cutover used a blue-green approach at the DNS level:

# Pre-cutover verification
kubectl get pods -n production --field-selector=status.phase!=Running

# Endpoint validation
kubectl get endpoints -n production

# Load testing
k6 run --vus 100 --duration 30m load-test.js

# DNS cutover
aws route53 change-resource-record-sets --hosted-zone-id Z123 \
--change-batch file://cutover.json

What Technical Challenges Were Encountered?

The migration revealed several challenges specific to the large enterprise context.

Network and Segmentation

Network configuration required particular attention. Configuring Kubernetes cluster networking: CNI Services Ingress covers these aspects.

# Strict NetworkPolicy for financial services
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: payment-isolation
namespace: production
spec:
podSelector:
matchLabels:
app: payment-service
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
zone: trusted
- podSelector:
matchLabels:
role: api-gateway
ports:
- protocol: TCP
port: 8080
egress:
- to:
- namespaceSelector:
matchLabels:
zone: database
ports:
- protocol: TCP
port: 5432

Key takeaway: Implement NetworkPolicies from the start. Adding them after migration on a production cluster is riskier.

Secrets Management

Migration of secrets from Docker secrets to Kubernetes Secrets then to Vault:

# Export Docker Swarm secrets
docker secret ls --format "{{.Name}}" | while read name; do
docker secret inspect $name --format "{{.Spec.Data}}" | base64 -d > secrets/$name
done

# Import into Kubernetes (temporary)
kubectl create secret generic legacy-secrets \
--from-file=secrets/ \
-n production

# Migration to Vault (target)
vault kv put secret/production/payment-service \
db_password=@secrets/db_password \
api_key=@secrets/api_key

Performance Problem Debugging

The kubectl cheatsheet: essential commands for administration was distributed to all teams:

# Identify consuming pods
kubectl top pods -n production --sort-by=cpu

# Analyze recent events
kubectl get events -n production --sort-by='.lastTimestamp' | tail -20

# Check ResourceQuotas
kubectl describe resourcequota -n production

# Network debugging with ephemeral container
kubectl debug -it pod/payment-xyz --image=nicolaka/netshoot -- bash

What Results Were Achieved After Migration?

Post-migration metrics validated the project's success.

Metric	Before (Swarm)	After (K8s)	Improvement
Deployments/day	8	35	+337%
Incident MTTR	45 min	12 min	-73%
Availability	99.52%	99.94%	+0.42 pts
Rollback time	15 min	90 sec	-90%
Incidents/month	12	7	-42%

"Kubernetes is no longer experimental but foundational. Soon, it will be essential to AI as well." - Chris Aniszczyk, CNCF State of Cloud Native 2026

Kubernetes application development benefited from this new infrastructure.

Migration ROI

Total investment included:

Team training: 44 people trained across 3 curricula
Infrastructure: 3 months of parallel environments
External consulting: Kubernetes architect support

Positive ROI was achieved in 14 months thanks to operational savings and deployment acceleration.

Key takeaway: Invest in training before migration. Competent teams reduce risks and accelerate the project.

What Lessons Can Be Learned from This Kubernetes Administration Case Study?

This migration generated learnings applicable to other contexts.

What Worked

Pre-migration training: Teams autonomous from the start of migration
Progressive migration: Risks contained through successive waves
Parallel environments: Rollback possible at any time
Complete documentation: Runbooks for each migrated service

What Could Be Improved

Earlier load testing: Late discovery of network bottlenecks
GitOps from the start: Implemented after migration, should have been initial
Unified observability: Swarm and K8s metrics difficult to correlate during transition

Resolving the 10 most common Kubernetes cluster problems would have helped anticipate certain challenges.

Recommendations for Similar Projects

1. Train BEFORE migrating (minimum 1 month before)
2. Start with non-critical services
3. Maintain Swarm/legacy in parallel 2-3 months
4. Automate regression tests
5. Document each step for audit

Next Steps for Your Migration Project

This production Kubernetes cluster migration experience demonstrates the feasibility of large-scale transformations with a structured methodology. According to Spectro Cloud, 80% of organizations now run Kubernetes in production with an average of 20+ clusters.

"Just given the capabilities that exist with Kubernetes, and the company's desire to consume more AI tools, we will use Kubernetes more in future." - Enterprise CTO, Spectro Cloud State of Kubernetes 2025

Prepare your team with appropriate training:

LFS458 Kubernetes Administration: 4 days to master cluster administration, CKA preparation
LFD459 Kubernetes for Application Developers: 3 days for development teams, CKAD preparation
LFS460 Kubernetes Security Fundamentals: 4 days for advanced security, CKS preparation
Kubernetes Fundamentals: 1 day to discover Kubernetes

Contact our teams to plan your team's training before your migration project.date_published: '2026-03-04'

Key Takeaways