troubleshooting7 min read

Resolve the 10 Most Common Kubernetes Deployment Errors

SFEIR Institute

Key Takeaways

  • 80% of deployment errors are resolved with kubectl describe pod followed by kubectl logs
  • '3 main causes: images not found, insufficient resources, incorrect config'
  • 'First reflexes: kubectl describe and kubectl logs'

Kubernetes deployment errors are part of daily life for any infrastructure engineer. CrashLoopBackOff, ImagePullBackOff, OOMKilled: these error messages block production deployments. With 82% of container users running Kubernetes in production, mastering troubleshooting distinguishes an effective system administrator from a beginner.

TL;DR: Most deployment errors come from 3 causes: images not found, insufficient resources, or incorrect configuration. Use kubectl describe and kubectl logs as first steps.

This skill is essential for Cloud Operations Kubernetes engineers preparing for their certifications.

Why Master Kubernetes Deployment Error Solutions?

ErrorFrequencyAverage Resolution Time
ImagePullBackOffVery frequent5-15 min
CrashLoopBackOffFrequent15-60 min
OOMKilledFrequent10-30 min
PendingFrequent5-30 min
CreateContainerConfigErrorMedium10-20 min
Remember: 80% of deployment errors are resolved with kubectl describe pod followed by kubectl logs. Master these two commands.

For more tutorials, check our Kubernetes Tutorials and Practical Guides hub.

Error 1: ImagePullBackOff and ErrImagePull

Symptom

kubectl get pods
NAME         READY   STATUS             RESTARTS   AGE
api-server   0/1     ImagePullBackOff   0          5m

Diagnosis

kubectl describe pod api-server | grep -A 10 Events

Possible causes:

  • Non-existent image or incorrect tag
  • Private registry without credentials
  • Docker Hub pull limit reached

Kubernetes Deployment Error Solutions for ImagePullBackOff

Incorrect image: check name and tag

# Verify that the image exists
docker pull registry.example.com/api:v1.2.3

# Correct the deployment
kubectl set image deployment/api-server api=registry.example.com/api:v1.2.3

Private registry: create an imagePullSecret

kubectl create secret docker-registry regcred \
--docker-server=registry.example.com \
--docker-username=user \
--docker-password=pass \
--docker-email=user@example.com
spec:
imagePullSecrets:
- name: regcred
containers:
- name: api
image: registry.example.com/api:v1.2.3

For manifest best practices, check our guide Best Practices for Structuring Your YAML Manifests.

Error 2: CrashLoopBackOff

Symptom

kubectl get pods
NAME         READY   STATUS             RESTARTS   AGE
api-server   0/1     CrashLoopBackOff   5          10m

Diagnosis

# Logs from the crashing container
kubectl logs api-server --previous

# Detailed events
kubectl describe pod api-server

Solutions

Application crashing at startup:

# Check the logs
kubectl logs api-server --previous

# Example output
Error: Cannot connect to database at db.example.com:5432

Misconfigured liveness probe:

# Too aggressive probe
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 3  # Too short if app takes 30s to start
periodSeconds: 3
failureThreshold: 1

# Correct configuration
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30  # Allow time for startup
periodSeconds: 10
failureThreshold: 3
Remember: CrashLoopBackOff indicates the container starts then stops. The --previous logs show what happened before the crash.

For a dedicated guide, see Debug a Pod in CrashLoopBackOff.

Error 3: OOMKilled (Out of Memory)

Symptom

kubectl get pods
NAME         READY   STATUS    RESTARTS   AGE
api-server   0/1     OOMKilled 3          15m

Diagnosis

kubectl describe pod api-server | grep -A 5 "Last State"

Kubernetes Deployment Error Solutions for OOMKilled

Increase memory limits:

resources:
requests:
memory: "256Mi"
limits:
memory: "512Mi"  # Was 256Mi, increased to 512Mi

Identify memory leaks:

# Monitor consumption
kubectl top pod api-server --containers

# Profile the application
kubectl exec -it api-server -- jcmd 1 VM.native_memory summary

According to ScaleOps, 65%+ of workloads run at less than half their allocated resources. Rightsizing avoids OOMKilled while optimizing costs.

Error 4: Pod Stuck in Pending

Symptom

kubectl get pods
NAME         READY   STATUS    RESTARTS   AGE
api-server   0/1     Pending   0          20m

Diagnosis

kubectl describe pod api-server | grep -A 10 Events

Typical messages:

  • Insufficient cpu / Insufficient memory
  • 0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate
  • pod has unbound immediate PersistentVolumeClaims

Solutions

Insufficient resources:

# Check available resources
kubectl describe nodes | grep -A 5 "Allocated resources"

# Reduce requests or add nodes

Untolerated taints:

spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "api"
effect: "NoSchedule"

Unbound PVC:

kubectl get pvc
kubectl describe pvc data-volume

To compare local development tools, see Minikube vs Kind vs K3s.

Error 5: CreateContainerConfigError

Symptom

kubectl get pods
NAME         READY   STATUS                       RESTARTS   AGE
api-server   0/1     CreateContainerConfigError   0          5m

Diagnosis

kubectl describe pod api-server | grep -A 10 Events

Solutions

Missing Secret or ConfigMap:

# Verify the secret exists
kubectl get secret db-credentials
kubectl get configmap app-config

# Create the missing secret
kubectl create secret generic db-credentials \
--from-literal=username=admin \
--from-literal=password=secret

Non-existent key in ConfigMap:

# Error: DATABASE_URL key doesn't exist
env:
- name: DB_URL
valueFrom:
configMapKeyRef:
name: app-config
key: DATABASE_URL  # Verify this key exists
Remember: CreateContainerConfigError almost always means a missing or incorrectly referenced Secret or ConfigMap.

Error 6: Service Not Accessible

Symptom

curl http://api-service:8080
curl: (7) Failed to connect to api-service port 8080

Diagnosis

# Check the service
kubectl get svc api-service
kubectl describe svc api-service

# Check endpoints
kubectl get endpoints api-service

Kubernetes Deployment Error Solutions for Services

No endpoints:

kubectl get endpoints api-service
NAME          ENDPOINTS   AGE
api-service   <none>      10m

The selector doesn't match any pod:

# Service
spec:
selector:
app: api-server  # Must match pod labels

# Pod
metadata:
labels:
app: api-server  # Verify it's identical

Incorrect port:

# Service exposes port 80, but pod listens on 8080
spec:
ports:
- port: 80
targetPort: 8080  # Must match containerPort

For system administrator training, check our dedicated Kubernetes system administrator training page.

Error 7: FailedScheduling with PodAffinity

Symptom

kubectl describe pod api-server
Events:
Warning  FailedScheduling  0/3 nodes are available: 3 node(s) didn't match pod affinity rules

Diagnosis

kubectl get pods -o wide --show-labels
kubectl get nodes --show-labels

Solutions

Affinity too strict:

# Use preferredDuringScheduling instead of required
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: api-server
topologyKey: kubernetes.io/hostname

For production deployment, check our Deployment and Production Kubernetes hub.

Error 8: Readiness Probe Failed

Symptom

kubectl get pods
NAME         READY   STATUS    RESTARTS   AGE
api-server   0/1     Running   0          5m

The pod is Running but not Ready (0/1).

Diagnosis

kubectl describe pod api-server | grep -A 15 Readiness

Solutions

Incorrect health endpoint:

readinessProbe:
httpGet:
path: /healthz  # Verify this path exists
port: 8080
initialDelaySeconds: 10
periodSeconds: 5

Unavailable dependency:

# Application waiting for a database
kubectl logs api-server | grep -i "connection"
Remember: A Running but not Ready pod doesn't receive traffic. The readinessProbe protects users from non-functional instances.

Error 9: Volume Mount Failed

Symptom

kubectl describe pod api-server
Warning  FailedMount  Unable to mount volumes: timeout expired waiting for volumes

Diagnosis

kubectl get pv
kubectl get pvc
kubectl describe pvc data-volume

Solutions

PVC not provisioned:

# Check the StorageClass
kubectl get sc

# Create a PVC with the right StorageClass
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-volume
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard
resources:
requests:
storage: 10Gi
EOF

For Docker Compose migration, see Migrate from Docker Compose to Kubernetes.

Error 10: NetworkPolicy Blocking Traffic

Symptom

kubectl exec -it client-pod -- curl http://api-service:8080
curl: (7) Failed to connect

But the service and endpoints are correct.

Diagnosis

kubectl get networkpolicy -A
kubectl describe networkpolicy -n default

Solutions

Add an ingress rule:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-api-access
namespace: default
spec:
podSelector:
matchLabels:
app: api-server
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
role: client
ports:
- protocol: TCP
port: 8080

For workload security, see Secure Your Kubernetes Workloads.

Quick Diagnostic Checklist

# 1. Pod status
kubectl get pod <pod-name> -o wide

# 2. Events
kubectl describe pod <pod-name> | tail -20

# 3. Logs
kubectl logs <pod-name> --previous

# 4. Container state
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].state}'

# 5. Node resources
kubectl describe node <node-name> | grep -A 10 "Allocated"
Remember: This 5-command sequence resolves 90% of deployment problems. Memorize it.

Take Action: Master Kubernetes Troubleshooting

Troubleshooting represents a key skill for the CKA exam. According to the Linux Foundation, the exam lasts 2 hours with a passing score of 66%. Practical troubleshooting questions count for a significant portion of the score.

Train with SFEIR to acquire these reflexes:

Contact our advisors to organize training tailored to your team's needs.