migration7 min read

Update a Kubernetes Cluster Without Service Interruption

SFEIR Institute

Key Takeaways

  • Three phases: control plane first, then workers one by one, finally validation.
  • Plan 2 to 4 hours to update a 10-node cluster without interruption.
  • 'Critical rule: never more than one minor version difference between components.'

A Kubernetes rolling upgrade refers to the progressive update of cluster components (control plane then worker nodes) without service interruption, keeping workloads available throughout the process.

You must master this procedure to ensure continuity of your applications in production.

TL;DR: You update your cluster in three phases: control plane first, then worker nodes one by one with drain/uncordon, and finally validation. The golden rule: never more than one minor version difference between components. Plan 2 to 4 hours for a 10-node cluster.

To master these critical skills, discover the LFS458 Kubernetes Administration training.

Key takeaway: 82% of container users run Kubernetes in production (CNCF Annual Survey 2025). Your ability to maintain these clusters up to date without downtime becomes an essential skill.

Why Must You Plan Your Kubernetes Updates?

According to the Spectro Cloud State of Kubernetes 2025 report, IT teams spend an average of 34 working days per year resolving Kubernetes issues. A poorly prepared update represents a significant portion of these incidents.

You face several risks if you neglect planning:

RiskImpactMitigation
Version skewAPI incompatibility between componentsRespect n+1/n-1 rule
Incomplete drainOrphaned pods, lost dataPodDisruptionBudgets configured
Impossible rollbackExtended downtimeetcd snapshots before upgrade
API deprecationBroken manifests post-upgradeAudit with kubectl convert

To deepen Kubernetes cluster administration fundamentals, consult our dedicated hub.

What Prerequisites to Check Before Launching Your Upgrade?

Validate these points before any intervention on your cluster.

Current Version Check

# Control plane component versions
kubectl version --short
kubectl get nodes -o wide

# kubeadm version on each node
kubeadm version

You must identify the target version. Kubernetes follows a release cycle every 4 months. The official documentation indicates that only the last three minor versions receive security patches.

Deprecated API Audit

# Identify resources using obsolete APIs
kubectl get --raw /metrics | grep apiserver_requested_deprecated_apis

# Use kubectl convert for migration
kubectl convert -f deployment.yaml --output-version apps/v1
Key takeaway: The Ingress NGINX Controller is retiring in March 2026 (InfoQ). You must check your dependencies before each upgrade.

PodDisruptionBudgets Configuration

You protect your critical workloads with PDBs:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: api-server

To resolve issues you might encounter, refer to our guide on the 10 most common Kubernetes cluster problems.

How to Prepare Your Environment for a Rolling Upgrade?

Mandatory etcd Backup

Execute this backup before any control plane modification.

# Connect to etcd pod
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-$(date +%Y%m%d).db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key

# Integrity verification
ETCDCTL_API=3 etcdctl snapshot status /backup/etcd-$(date +%Y%m%d).db

You must store this snapshot outside the cluster. In case of critical failure, it's your only restoration point.

Cluster State Validation

# Check node health
kubectl get nodes
kubectl get --raw='/healthz?verbose'

# Identify non-ready pods
kubectl get pods --all-namespaces | grep -v Running | grep -v Completed

# Check system resources
kubectl top nodes

If you encounter network issues during this phase, consult our guide to diagnose and resolve network problems in a Kubernetes cluster.

How to Update the Control Plane Step by Step?

Step 1: First Control Plane Node Upgrade

You start with the primary node. This operation updates the API server, controller-manager, scheduler, and etcd.

# Update kubeadm packages
sudo apt-mark unhold kubeadm
sudo apt-get update && sudo apt-get install -y kubeadm=1.30.0-1.1
sudo apt-mark hold kubeadm

# Check upgrade plan
sudo kubeadm upgrade plan

# Apply upgrade
sudo kubeadm upgrade apply v1.30.0

Step 2: kubelet and kubectl Upgrade

# Drain the control plane node
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

# Update kubelet and kubectl
sudo apt-mark unhold kubelet kubectl
sudo apt-get install -y kubelet=1.30.0-1.1 kubectl=1.30.0-1.1
sudo apt-mark hold kubelet kubectl

# Restart kubelet
sudo systemctl daemon-reload
sudo systemctl restart kubelet

# Re-enable the node
kubectl uncordon <node-name>
Key takeaway: CKA certification requires a 66% score on a 2-hour practical exam (Linux Foundation). Upgrade procedures are an evaluated skill. If you're preparing for this certification, the LFS458 training covers these scenarios in detail.

Step 3: Secondary Control Plane Nodes Upgrade

You repeat the procedure on each additional control plane node:

# For each secondary control plane node
sudo kubeadm upgrade node

# Then drain, upgrade kubelet/kubectl, restart, uncordon

How to Migrate Your Worker Nodes Without Interruption?

Rolling Update Strategy for Workers

You update worker nodes one by one. This approach ensures your workloads always have sufficient resources.

# For each worker node
NODE=worker-01

# 1. Graceful drain
kubectl drain $NODE --ignore-daemonsets --delete-emptydir-data --grace-period=60

# 2. Verify pods are rescheduled
kubectl get pods -o wide --all-namespaces | grep $NODE

# 3. Upgrade kubeadm config
sudo kubeadm upgrade node

# 4. Upgrade kubelet
sudo apt-get install -y kubelet=1.30.0-1.1
sudo systemctl daemon-reload
sudo systemctl restart kubelet

# 5. Re-enable
kubectl uncordon $NODE

To understand your node network configuration, refer to our guide on configuring Kubernetes cluster networking: CNI, Services, and Ingress.

StatefulSet Management During Upgrade

You must pay special attention to StatefulSets:

# Check PVCs attached to the node
kubectl get pvc --all-namespaces -o json | jq '.items[] | select(.spec.volumeName != null)'

# Check StatefulSets
kubectl get statefulsets --all-namespaces
Workload TypePrecautionCommand
StatefulSetWait for replica synckubectl rollout status sts/
DaemonSetAutomaticNone
DeploymentPDB requiredCheck minAvailable
JobComplete before drainkubectl wait --for=condition=complete

How to Validate Your Upgrade and Prepare Rollback?

Post-Upgrade Validation Checklist

Run these checks immediately after each step.

# 1. Consistent versions
kubectl get nodes -o custom-columns=NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion

# 2. Component health
kubectl get --raw='/healthz?verbose'
kubectl get pods -n kube-system

# 3. Application tests
kubectl run test-pod --image=nginx --restart=Never
kubectl expose pod test-pod --port=80 --type=ClusterIP
kubectl run curl-test --image=curlimages/curl --rm -it --restart=Never -- curl test-pod

# 4. Cleanup
kubectl delete pod test-pod
kubectl delete svc test-pod

Rollback Plan

You must prepare your rollback before starting the upgrade:

# etcd restoration (only if critical corruption)
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-backup.db \
--data-dir=/var/lib/etcd-restore \
--initial-cluster=master=https://10.0.0.1:2380 \
--initial-advertise-peer-urls=https://10.0.0.1:2380

# kubelet downgrade (avoid if possible)
sudo apt-get install -y kubelet=1.29.0-1.1
sudo systemctl restart kubelet
Key takeaway: Kubernetes downgrade is not officially supported. Your etcd backup remains your primary safety net. Test restoration regularly.

For a complete view of best practices, consult our article on securing a Kubernetes cluster: essential best practices.

What Common Errors to Avoid?

You must know the classic pitfalls:

Error 1: Ignoring version skew

# Bad: skip multiple versions
# v1.27 → v1.30 directly

# Correct: incremental upgrade
# v1.27 → v1.28 → v1.29 → v1.30

Error 2: Forgetting third-party components

Before your upgrade, check compatibility of:

  • CNI plugins (Calico, Cilium, Flannel)
  • Ingress controllers
  • Service meshes (Istio, Linkerd)
  • Monitoring stack (Prometheus, Grafana)

Error 3: Neglecting post-upgrade load tests

# Basic load test with kubectl
kubectl run load-test --image=busybox --rm -it -- sh -c 'for i in $(seq 1 100); do wget -q -O- http://service; done'

According to Chris Aniszczyk, CNCF CTO: "Kubernetes is no longer experimental but foundational. Soon, it will be essential to AI as well" (CNCF State of Cloud Native 2026). You're investing in a durable skill.

To go further, explore our experience feedback: migrating a production Kubernetes cluster at a major account.

How to Automate Your Future Upgrades?

You can script the complete procedure:

#!/bin/bash
set -euo pipefail

VERSION="1.30.0-1.1"
NODES=$(kubectl get nodes -o jsonpath='{.items[*].metadata.name}')

for node in $NODES; do
echo "Upgrading $node..."
kubectl drain $node --ignore-daemonsets --delete-emptydir-data
ssh $node "sudo apt-get update && sudo apt-get install -y kubeadm=$VERSION kubelet=$VERSION"
ssh $node "sudo kubeadm upgrade node && sudo systemctl restart kubelet"
kubectl uncordon $node
kubectl wait --for=condition=Ready node/$node --timeout=120s
done

Infrastructure engineers who master these procedures are highly sought after. According to Hired: "Demand and salaries for highly-skilled and qualified tech talent are fiercer than ever, and certifications present a clear pathway for IT professionals to further their careers."

If you're starting out, begin with the Kubernetes fundamentals before tackling advanced administration.

Take Action: Train in Kubernetes Administration

You now have a complete vision of the zero-downtime update process. To consolidate these skills and prepare for CKA certification, SFEIR offers official Linux Foundation trainings:

  • LFS458 Kubernetes Administration: 4 days of intensive training covering installation, configuration, updating, and troubleshooting of production clusters. This training prepares you directly for the CKA exam.

With more than 104,000 candidates having taken the CKA and 49% annual growth (CNCF Training Report), this certification validates your skills with employers. Certification remains valid for 2 years (Linux Foundation). For deeper learning, consult our system administrator Kubernetes Fundamentals training.

Contact SFEIR to plan your training path and get a personalized quote.