Key Takeaways
- ✓kubectl drain evacuates pods before system maintenance
- ✓20+ clusters per organization on average require autoscaling (Spectrocloud 2025)
- ✓kubectl cordon prevents scheduling of new pods on a node
Kubernetes node management is a fundamental skill for any production cluster administrator. With 82% of container users running Kubernetes in production and an average of 20+ clusters per organization, mastering drain, cordon, and autoscaling operations becomes essential. This guide walks you step by step through Kubernetes node management training, from adding a worker node to configuring automatic autoscaling.
TL;DR: To effectively manage your Kubernetes nodes, usekubectl cordonto prevent scheduling,kubectl drainto evacuate pods before maintenance, and configure the Cluster Autoscaler to automatically adjust capacity. Each operation should be verified withkubectl get nodesandkubectl describe node.
These skills are at the core of the LFS458 Kubernetes Administration training.
Prerequisites: required environment and tools
Before starting this practical guide, ensure you have the following:
Infrastructure:
- A working Kubernetes cluster (v1.28+)
- SSH access to cluster nodes
- Administrator rights (ClusterRole
cluster-admin)
Installed tools:
# Check kubectl version
kubectl version --client
# Client Version: v1.29.0
# Check cluster access
kubectl cluster-info
# Kubernetes control plane is running at https://192.168.1.10:6443
Prior knowledge:
- Understanding of Kubernetes control plane architecture
- Familiarity with basic concepts (consult Kubernetes fundamentals)
Key takeaway: Withoutcluster-adminaccess, you won't be able to perform drain operations or modify node taints. Check your permissions withkubectl auth can-i drain nodes.
Step 1: Add a new worker node to the cluster
Adding a worker node is done in three phases: server preparation, join token generation, and integration verification.
1.1 Prepare the new server
Install dependencies on the new node:
# Disable swap (required for Kubernetes)
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
# Install containerd
sudo apt-get update && sudo apt-get install -y containerd
# Configure containerd
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo systemctl restart containerd
1.2 Generate the join token from the control plane
On the master node, create a new token:
kubeadm token create --print-join-command
# Expected result:
# kubeadm join 192.168.1.10:6443 --token abcdef.0123456789abcdef \
# --discovery-token-ca-cert-hash sha256:xyz123...
1.3 Execute the join command on the worker
On the new worker node:
sudo kubeadm join 192.168.1.10:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:xyz123...
# [preflight] Running pre-flight checks
# [kubelet-start] Starting the kubelet
# This node has joined the cluster
Verification from master:
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# master Ready control-plane 30d v1.29.0
# worker1 Ready <none> 25d v1.29.0
# worker2 Ready <none> 10s v1.29.0
For a complete multi-node installation, consult the kubeadm installation guide.
Step 2: Use cordon to isolate a node
The kubectl cordon command marks a node as non-schedulable. Existing pods continue running, but no new pods will be placed on this node.
2.1 Mark a node as non-schedulable
kubectl cordon worker2
# node/worker2 cordoned
Check node status:
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# master Ready control-plane 30d v1.29.0
# worker1 Ready <none> 25d v1.29.0
# worker2 Ready,SchedulingDisabled <none> 1h v1.29.0
2.2 Examine applied taints
kubectl describe node worker2 | grep -A5 Taints
# Taints: node.kubernetes.io/unschedulable:NoSchedule
Key takeaway: Cordon is ideal for planned maintenance. Existing pods are not impacted, allowing you to prepare maintenance without service interruption.
2.3 Restore scheduling
Use uncordon to reactivate the node:
kubectl uncordon worker2
# node/worker2 uncordoned
kubectl get nodes worker2
# NAME STATUS ROLES AGE VERSION
# worker2 Ready <none> 1h v1.29.0
Step 3: Perform a safe drain for maintenance
kubectl drain evacuates all pods from a node before maintenance. This operation is essential for Kubernetes production node maintenance.
3.1 Standard drain with safety options
kubectl drain worker2 --ignore-daemonsets --delete-emptydir-data
# node/worker2 cordoned
# evicting pod default/nginx-deployment-abc123
# evicting pod kube-system/coredns-xyz789
# pod/nginx-deployment-abc123 evicted
# pod/coredns-xyz789 evicted
# node/worker2 drained
Essential options:
| Option | Description |
|---|---|
--ignore-daemonsets | Ignore DaemonSets (cannot be rescheduled) |
--delete-emptydir-data | Delete emptyDir data |
--force | Force eviction of standalone pods |
--grace-period=30 | Grace period for pod shutdown |
--timeout=300s | Operation timeout |
3.2 Handle PodDisruptionBudgets
PodDisruptionBudgets (PDB) can block a drain:
kubectl drain worker2 --ignore-daemonsets
# error: cannot delete pods with local storage, evicting pods with local storage may cause data loss
# error: unable to drain node "worker2" due to PodDisruptionBudget
Check PDBs:
kubectl get pdb -A
# NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
# default nginx-pdb 2 N/A 0 10d
Configure PDBs only for critical workloads.
3.3 Post-drain verification
kubectl get pods -o wide | grep worker2
# (no results - all pods have been evacuated)
kubectl get nodes worker2
# NAME STATUS ROLES AGE VERSION
# worker2 Ready,SchedulingDisabled <none> 1h v1.29.0
Key takeaway: Always check PDBs before a planned drain. A blocked drain can delay critical maintenance and create emergency situations.
Step 4: Configure node autoscaling
Kubernetes cluster node autoscaling allows automatic adaptation of cluster capacity to load. According to ScaleOps, 65%+ of workloads use less than half their allocated resources.
4.1 Install the Cluster Autoscaler
Deploy the Cluster Autoscaler (AWS EKS example):
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
Configure parameters:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
template:
spec:
containers:
- name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --cloud-provider=aws
- --nodes=2:10:my-node-group
- --scale-down-delay-after-add=10m
- --scale-down-unneeded-time=10m
- --skip-nodes-with-local-storage=false
4.2 Verify operation
kubectl get pods -n kube-system -l app=cluster-autoscaler
# NAME READY STATUS RESTARTS AGE
# cluster-autoscaler-7c4d5f8d9-abcde 1/1 Running 0 5m
kubectl logs -n kube-system -l app=cluster-autoscaler --tail=20
# I0215 10:30:00.123456 1 scale_up.go:300] Scaled up node group my-node-group: 3 -> 4
The LFS458 training covers autoscaling configuration for different cloud providers in detail.
4.3 Configure scaling metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
kubectl apply -f nginx-hpa.yaml
kubectl get hpa
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
# nginx-hpa Deployment/nginx 45%/70% 3 10 3 1m
Step 5: Upgrade a node without interruption
Node upgrades are part of Kubernetes drain cordon node management in production.
5.1 Rolling upgrade process
# 1. Drain the node
kubectl drain worker2 --ignore-daemonsets --delete-emptydir-data
# 2. Perform maintenance (OS update, kubelet, etc.)
sudo apt-get update && sudo apt-get upgrade -y kubelet kubeadm
sudo systemctl restart kubelet
# 3. Verify version
kubectl get nodes worker2
# NAME STATUS ROLES AGE VERSION
# worker2 Ready,SchedulingDisabled <none> 30d v1.29.1
# 4. Reactivate the node
kubectl uncordon worker2
5.2 Post-maintenance health check
kubectl get nodes -o wide
# All nodes should be Ready
kubectl get pods -A -o wide | grep -v Running
# No pods should be in non-Running state
To deepen network aspects of your cluster, consult the guide on CNI, Services and Ingress network configuration.
Troubleshooting: common problems and solutions
Drain remains blocked
Symptom: The kubectl drain command doesn't complete.
kubectl drain worker2 --ignore-daemonsets --timeout=60s
# evicting pod default/stuck-pod
# error: timed out waiting for pod to be deleted
Solution:
# Identify the blocking pod
kubectl get pods -o wide | grep worker2
# Force deletion if necessary
kubectl delete pod stuck-pod --force --grace-period=0
# Retry drain
kubectl drain worker2 --ignore-daemonsets --force
Node remains NotReady after maintenance
Diagnosis:
kubectl describe node worker2 | grep -A10 Conditions
# Type Status
# MemoryPressure False
# DiskPressure False
# PIDPressure False
# Ready False
# Check kubelet logs
sudo journalctl -u kubelet -f
Common solution:
sudo systemctl restart kubelet
sudo systemctl restart containerd
Key takeaway: Document each maintenance in a runbook. The ecosystem evolves constantly, your procedures must follow.
Autoscaler doesn't trigger scale-up
Verification:
kubectl describe configmap cluster-autoscaler-status -n kube-system
# Check non-scaling reasons
kubectl get events -n kube-system | grep autoscaler
For advanced node management and certification preparation, explore Kubernetes cluster administration and etcd backup techniques.
Next steps: validate your skills
Kubernetes node management represents a key skill for any Kubernetes system administrator CKAD certification. With 104,000 people having taken the CKA and 49% annual growth, these certifications validate your expertise to employers.
As a company CTO confirms: "Just given the capabilities that exist with Kubernetes, and the company's desire to consume more AI tools, we will use Kubernetes more in future."
Recommended training:
- LFS458 Kubernetes Administration: 4 days to master cluster administration and prepare for CKA
- LFD459 Kubernetes for developers: 3 days oriented toward application development and CKAD
- Kubernetes fundamentals: 1 day to discover essential concepts. To go deeper, consult our Paris cluster administration training.
To explore all Kubernetes skills, consult the Complete Kubernetes Training Guide.