Key Takeaways
- ✓80% of deployment errors are resolved with kubectl describe pod followed by kubectl logs
- ✓'3 main causes: images not found, insufficient resources, incorrect config'
- ✓'First reflexes: kubectl describe and kubectl logs'
Kubernetes deployment errors are part of daily life for any infrastructure engineer. CrashLoopBackOff, ImagePullBackOff, OOMKilled: these error messages block production deployments. With 82% of container users running Kubernetes in production, mastering troubleshooting distinguishes an effective system administrator from a beginner.
TL;DR: Most deployment errors come from 3 causes: images not found, insufficient resources, or incorrect configuration. Usekubectl describeandkubectl logsas first steps.
This skill is essential for Cloud Operations Kubernetes engineers preparing for their certifications.
Why Master Kubernetes Deployment Error Solutions?
| Error | Frequency | Average Resolution Time |
|---|---|---|
| ImagePullBackOff | Very frequent | 5-15 min |
| CrashLoopBackOff | Frequent | 15-60 min |
| OOMKilled | Frequent | 10-30 min |
| Pending | Frequent | 5-30 min |
| CreateContainerConfigError | Medium | 10-20 min |
Remember: 80% of deployment errors are resolved withkubectl describe podfollowed bykubectl logs. Master these two commands.
For more tutorials, check our Kubernetes Tutorials and Practical Guides hub.
Error 1: ImagePullBackOff and ErrImagePull
Symptom
kubectl get pods
NAME READY STATUS RESTARTS AGE
api-server 0/1 ImagePullBackOff 0 5m
Diagnosis
kubectl describe pod api-server | grep -A 10 Events
Possible causes:
- Non-existent image or incorrect tag
- Private registry without credentials
- Docker Hub pull limit reached
Kubernetes Deployment Error Solutions for ImagePullBackOff
Incorrect image: check name and tag
# Verify that the image exists
docker pull registry.example.com/api:v1.2.3
# Correct the deployment
kubectl set image deployment/api-server api=registry.example.com/api:v1.2.3
Private registry: create an imagePullSecret
kubectl create secret docker-registry regcred \
--docker-server=registry.example.com \
--docker-username=user \
--docker-password=pass \
--docker-email=user@example.com
spec:
imagePullSecrets:
- name: regcred
containers:
- name: api
image: registry.example.com/api:v1.2.3
For manifest best practices, check our guide Best Practices for Structuring Your YAML Manifests.
Error 2: CrashLoopBackOff
Symptom
kubectl get pods
NAME READY STATUS RESTARTS AGE
api-server 0/1 CrashLoopBackOff 5 10m
Diagnosis
# Logs from the crashing container
kubectl logs api-server --previous
# Detailed events
kubectl describe pod api-server
Solutions
Application crashing at startup:
# Check the logs
kubectl logs api-server --previous
# Example output
Error: Cannot connect to database at db.example.com:5432
Misconfigured liveness probe:
# Too aggressive probe
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 3 # Too short if app takes 30s to start
periodSeconds: 3
failureThreshold: 1
# Correct configuration
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30 # Allow time for startup
periodSeconds: 10
failureThreshold: 3
Remember: CrashLoopBackOff indicates the container starts then stops. The --previous logs show what happened before the crash.
For a dedicated guide, see Debug a Pod in CrashLoopBackOff.
Error 3: OOMKilled (Out of Memory)
Symptom
kubectl get pods
NAME READY STATUS RESTARTS AGE
api-server 0/1 OOMKilled 3 15m
Diagnosis
kubectl describe pod api-server | grep -A 5 "Last State"
Kubernetes Deployment Error Solutions for OOMKilled
Increase memory limits:
resources:
requests:
memory: "256Mi"
limits:
memory: "512Mi" # Was 256Mi, increased to 512Mi
Identify memory leaks:
# Monitor consumption
kubectl top pod api-server --containers
# Profile the application
kubectl exec -it api-server -- jcmd 1 VM.native_memory summary
According to ScaleOps, 65%+ of workloads run at less than half their allocated resources. Rightsizing avoids OOMKilled while optimizing costs.
Error 4: Pod Stuck in Pending
Symptom
kubectl get pods
NAME READY STATUS RESTARTS AGE
api-server 0/1 Pending 0 20m
Diagnosis
kubectl describe pod api-server | grep -A 10 Events
Typical messages:
Insufficient cpu/Insufficient memory0/3 nodes are available: 3 node(s) had taints that the pod didn't toleratepod has unbound immediate PersistentVolumeClaims
Solutions
Insufficient resources:
# Check available resources
kubectl describe nodes | grep -A 5 "Allocated resources"
# Reduce requests or add nodes
Untolerated taints:
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "api"
effect: "NoSchedule"
Unbound PVC:
kubectl get pvc
kubectl describe pvc data-volume
To compare local development tools, see Minikube vs Kind vs K3s.
Error 5: CreateContainerConfigError
Symptom
kubectl get pods
NAME READY STATUS RESTARTS AGE
api-server 0/1 CreateContainerConfigError 0 5m
Diagnosis
kubectl describe pod api-server | grep -A 10 Events
Solutions
Missing Secret or ConfigMap:
# Verify the secret exists
kubectl get secret db-credentials
kubectl get configmap app-config
# Create the missing secret
kubectl create secret generic db-credentials \
--from-literal=username=admin \
--from-literal=password=secret
Non-existent key in ConfigMap:
# Error: DATABASE_URL key doesn't exist
env:
- name: DB_URL
valueFrom:
configMapKeyRef:
name: app-config
key: DATABASE_URL # Verify this key exists
Remember: CreateContainerConfigError almost always means a missing or incorrectly referenced Secret or ConfigMap.
Error 6: Service Not Accessible
Symptom
curl http://api-service:8080
curl: (7) Failed to connect to api-service port 8080
Diagnosis
# Check the service
kubectl get svc api-service
kubectl describe svc api-service
# Check endpoints
kubectl get endpoints api-service
Kubernetes Deployment Error Solutions for Services
No endpoints:
kubectl get endpoints api-service
NAME ENDPOINTS AGE
api-service <none> 10m
The selector doesn't match any pod:
# Service
spec:
selector:
app: api-server # Must match pod labels
# Pod
metadata:
labels:
app: api-server # Verify it's identical
Incorrect port:
# Service exposes port 80, but pod listens on 8080
spec:
ports:
- port: 80
targetPort: 8080 # Must match containerPort
For system administrator training, check our dedicated Kubernetes system administrator training page.
Error 7: FailedScheduling with PodAffinity
Symptom
kubectl describe pod api-server
Events:
Warning FailedScheduling 0/3 nodes are available: 3 node(s) didn't match pod affinity rules
Diagnosis
kubectl get pods -o wide --show-labels
kubectl get nodes --show-labels
Solutions
Affinity too strict:
# Use preferredDuringScheduling instead of required
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: api-server
topologyKey: kubernetes.io/hostname
For production deployment, check our Deployment and Production Kubernetes hub.
Error 8: Readiness Probe Failed
Symptom
kubectl get pods
NAME READY STATUS RESTARTS AGE
api-server 0/1 Running 0 5m
The pod is Running but not Ready (0/1).
Diagnosis
kubectl describe pod api-server | grep -A 15 Readiness
Solutions
Incorrect health endpoint:
readinessProbe:
httpGet:
path: /healthz # Verify this path exists
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
Unavailable dependency:
# Application waiting for a database
kubectl logs api-server | grep -i "connection"
Remember: A Running but not Ready pod doesn't receive traffic. The readinessProbe protects users from non-functional instances.
Error 9: Volume Mount Failed
Symptom
kubectl describe pod api-server
Warning FailedMount Unable to mount volumes: timeout expired waiting for volumes
Diagnosis
kubectl get pv
kubectl get pvc
kubectl describe pvc data-volume
Solutions
PVC not provisioned:
# Check the StorageClass
kubectl get sc
# Create a PVC with the right StorageClass
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-volume
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard
resources:
requests:
storage: 10Gi
EOF
For Docker Compose migration, see Migrate from Docker Compose to Kubernetes.
Error 10: NetworkPolicy Blocking Traffic
Symptom
kubectl exec -it client-pod -- curl http://api-service:8080
curl: (7) Failed to connect
But the service and endpoints are correct.
Diagnosis
kubectl get networkpolicy -A
kubectl describe networkpolicy -n default
Solutions
Add an ingress rule:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-api-access
namespace: default
spec:
podSelector:
matchLabels:
app: api-server
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
role: client
ports:
- protocol: TCP
port: 8080
For workload security, see Secure Your Kubernetes Workloads.
Quick Diagnostic Checklist
# 1. Pod status
kubectl get pod <pod-name> -o wide
# 2. Events
kubectl describe pod <pod-name> | tail -20
# 3. Logs
kubectl logs <pod-name> --previous
# 4. Container state
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].state}'
# 5. Node resources
kubectl describe node <node-name> | grep -A 10 "Allocated"
Remember: This 5-command sequence resolves 90% of deployment problems. Memorize it.
Take Action: Master Kubernetes Troubleshooting
Troubleshooting represents a key skill for the CKA exam. According to the Linux Foundation, the exam lasts 2 hours with a passing score of 66%. Practical troubleshooting questions count for a significant portion of the score.
Train with SFEIR to acquire these reflexes:
- The LFS458 Kubernetes Administration training includes troubleshooting labs over 4 days
- The Kubernetes Fundamentals training lays the foundation for understanding errors
- The LFD459 Developer training covers application debugging
Contact our advisors to organize training tailored to your team's needs.