Resolve Common Kubernetes Deployment Errors

Resolving common Kubernetes deployment errors represents a critical skill for any Cloud Operations Kubernetes engineer. IT teams spend an average of 34 working days per year resolving Kubernetes problems.

More than 60% of cluster management time is spent on troubleshooting according to Spectro Cloud. This guide gives you the exact commands and proven solutions to diagnose and fix CrashLoopBackOff, ImagePullBackOff, and other common errors.

TL;DR: Kubernetes deployment errors follow predictable patterns. This guide covers the 7 most frequent errors with their diagnostic commands, root causes, and solutions. Master kubectl describe, kubectl logs --previous, and kubectl get events to resolve 90% of issues.

To master debugging pods Kubernetes in real conditions, follow the LFD459 Kubernetes for Application Developers training.

Quick Symptom Index

Pod Status	Meaning	Section
`CrashLoopBackOff`	Container restarts in loop	CrashLoopBackOff
`ImagePullBackOff`	Image not found or inaccessible	ImagePullBackOff
`Pending`	Pod not scheduled to a node	Pending
`CreateContainerConfigError`	Configuration problem	ConfigError
`OOMKilled`	Memory exceeded	OOMKilled
`Running` but unhealthy	Failing probes	Probes
`Evicted`	Node under pressure	Eviction

Remember: 82% of container users run Kubernetes in production according to the CNCF Annual Survey 2025. Mastering troubleshooting is essential.

CrashLoopBackOff: Container Restarts in Loop {#crashloopbackoff}

CrashLoopBackOff indicates your container starts, crashes, and Kubernetes tries to restart it with exponential backoff. This status represents 40% of Kubernetes support tickets.

Symptom

$ kubectl get pods
NAME           READY   STATUS             RESTARTS      AGE
api-server-1   0/1     CrashLoopBackOff   5 (2m ago)    10m

Step 1: Examine Crashed Container Logs

# Current instance logs
kubectl logs api-server-1

# Previous instance logs (after crash)
kubectl logs api-server-1 --previous

# Follow logs in real-time
kubectl logs api-server-1 -f

Step 2: Analyze Pod Events

kubectl describe pod api-server-1 | grep -A20 "Events:"

Causes and Solutions

Cause	Diagnostic Indicator	Solution
Application crash at startup	Stack trace in logs	Fix code, verify dependencies
Missing environment variable	`KeyError`, `undefined`	Add variable in Deployment
Missing config file	`FileNotFoundError`	Verify ConfigMaps and Secrets mounts
Port already in use	`Address already in use`	Modify `containerPort` or kill process
Invalid command	`exec format error`	Check `command:` and `args:` in spec

# Fix example: adding a missing variable
spec:
containers:
- name: api
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url

ImagePullBackOff: Image Not Found {#imagepullbackoff}

ImagePullBackOff means Kubernetes cannot download the specified container image. This problem occurs in 25% of first deployments.

Symptom

$ kubectl get pods
NAME         READY   STATUS             RESTARTS   AGE
web-app-1    0/1     ImagePullBackOff   0          5m

Diagnosis

# See exact error message
kubectl describe pod web-app-1 | grep -A5 "Events:"

# Check image name
kubectl get pod web-app-1 -o jsonpath='{.spec.containers[*].image}'

Causes and Solutions

Cause	Error Message	Solution
Non-existent image	`manifest unknown`	Verify image name and tag
Private registry	`unauthorized`	Create an `imagePullSecret`
Invalid tag	`tag not found`	Use an existing tag (`latest`, `v1.2.3`)
Blocked network	`connection refused`	Check firewall rules

# Create secret for private registry
kubectl create secret docker-registry regcred \
--docker-server=registry.example.com \
--docker-username=user \
--docker-password=pass \
--docker-email=user@example.com

# Reference in pod
kubectl patch serviceaccount default \
-p '{"imagePullSecrets": [{"name": "regcred"}]}'

To go deeper on these techniques, see the advanced pod and container debugging guide.

Pending: Pod Not Scheduled {#pending}

A Pending pod hasn't been assigned to a node by the scheduler. The cause is usually a lack of resources or impossible constraints to satisfy.

Diagnosis

# Identify the pending reason
kubectl describe pod my-pod | grep -A10 "Events:"

# See available resources on nodes
kubectl describe nodes | grep -A5 "Allocated resources"

# List node taints
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

Causes and Solutions

Cause	Event Message	Solution
Insufficient resources	`Insufficient cpu/memory`	Reduce requests or add nodes
Impossible NodeSelector	`node(s) didn't match selector`	Add label to node or modify selector
Untolerated taints	`node(s) had taints`	Add tolerations to pod
Unbound PVC	`persistentvolumeclaim not bound`	Check PV and StorageClass

# Example: adjust requests to avoid pending
resources:
requests:
memory: "128Mi"  # Reduce if too high
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"

Remember: Always configure realistic requests. Overestimated requests block scheduling even if the node has real capacity.

CreateContainerConfigError: Configuration Problem {#createcontainerconfigerror}

CreateContainerConfigError indicates an error in container configuration before it even starts. The problem often comes from referenced Secrets or ConfigMaps.

Diagnosis

kubectl describe pod my-pod | grep -A3 "Warning"

# Verify Secret exists
kubectl get secret my-secret

# Verify key exists in Secret
kubectl get secret my-secret -o jsonpath='{.data}'

Causes and Solutions

Cause	Solution
Non-existent Secret	Create Secret before Deployment
Missing key in Secret	Add key with `kubectl edit secret`
Missing referenced ConfigMap	Create required ConfigMap
Incorrect subPath	Verify path spelling

# Create missing secret
kubectl create secret generic app-secret \
--from-literal=API_KEY=abc123

# Check references in deployment
kubectl get deployment my-app -o yaml | grep -A5 "secretKeyRef"

ConfigMaps and Secrets management is covered in detail in our Kubernetes Application Development guides.

OOMKilled: Memory Exceeded {#oomkilled}

OOMKilled means the container exceeded its memory limit and was killed by the Linux kernel. This is a protection to prevent the entire node from becoming unstable.

Diagnosis

# See termination reason
kubectl describe pod my-pod | grep -A3 "Last State"

# See restart history
kubectl get pod my-pod -o jsonpath='{.status.containerStatuses[0].lastState}'

# Monitor memory consumption
kubectl top pod my-pod

Solution

# Increase memory limit
resources:
requests:
memory: "256Mi"
limits:
memory: "512Mi"  # Double if OOMKilled is frequent

# Analyze consumption before increasing
kubectl exec my-pod -- cat /sys/fs/cgroup/memory/memory.usage_in_bytes

Remember: Never set limits.memory too close to requests.memory. Leave a 50% margin for peaks.

Probes Liveness and Readiness: Silent Failures {#probes-liveness-and-readiness}

Failing probes cause subtle behaviors. Failing liveness kills the container. Failing readiness removes the pod from the Service without killing it.

Diagnosis

# See probe failures
kubectl describe pod my-pod | grep -E "(Liveness|Readiness)"

# Manually test endpoint
kubectl exec my-pod -- curl -s localhost:8080/health
kubectl exec my-pod -- wget -qO- localhost:8080/ready

Common Errors

Problem	Symptom	Solution
initialDelaySeconds too short	Container killed at startup	Increase to 30-60s
timeoutSeconds too short	Intermittent failures	Change from 1s to 5s
Incorrect port	Connection refused	Check `containerPort`
Incorrect path	404 Not Found	Fix endpoint path

# Robust probe configuration
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5

Eviction: Pod Expelled from Node {#eviction}

Eviction occurs when a node runs out of critical resources. The kubelet expels pods to protect the node.

Diagnosis

# See node conditions
kubectl describe node my-node | grep -A5 "Conditions:"

# See evicted pods
kubectl get pods --field-selector=status.phase=Failed | grep Evicted

# Clean up evicted pods
kubectl delete pods --field-selector=status.phase=Failed

Condition	Default Threshold	Cause
MemoryPressure	< 100Mi available	Pods consuming too much
DiskPressure	< 10% free space	Logs or images too large
PIDPressure	< 100 free PIDs	Too many processes

Universal Diagnostic Toolkit

# Essential commands for all debugging
kubectl get pods -o wide                    # Extended view with node
kubectl describe pod <pod>                  # Complete details
kubectl logs <pod> --previous               # Previous crash logs
kubectl get events --sort-by=.lastTimestamp # Recent events
kubectl exec -it <pod> -- /bin/sh           # Shell in container
kubectl top pods                            # Resource consumption

Quick Diagnostic Script

#!/bin/bash
POD=$1
echo "=== Status ==="
kubectl get pod $POD -o wide
echo "=== Events ==="
kubectl describe pod $POD | grep -A15 "Events:"
echo "=== Logs (last 50 lines) ==="
kubectl logs $POD --tail=50
echo "=== Previous logs ==="
kubectl logs $POD --previous --tail=20 2>/dev/null || echo "No previous logs"

Prevention: Avoid Errors Before Deployment

Validate your manifests before applying. 70% of organizations use Kubernetes with Helm according to Orca Security 2025. Use validation tools.

# Validate YAML syntax
kubectl apply --dry-run=client -f deployment.yaml

# Server-side validation (detects more errors)
kubectl apply --dry-run=server -f deployment.yaml

# With Helm
helm template my-release ./chart | kubectl apply --dry-run=server -f -

To go further with best practices, explore the differences between Helm and Kustomize and cloud-native development patterns.

Remember: Systematically test with --dry-run=server before every production deployment. This command detects configuration errors that --dry-run=client misses.

Training to Master Kubernetes Troubleshooting

As a CTO highlights in the Spectro Cloud State of Kubernetes 2025: "Just given the capabilities that exist with Kubernetes, and the company's desire to consume more AI tools, we will use Kubernetes more in future."

The Kubernetes Fundamentals training lets you discover debugging basics in 1 day. For complete mastery, the LFD459 Kubernetes for Developers training covers advanced troubleshooting over 3 days and prepares for CKAD certification. Infrastructure engineers preparing for CKA will find in-depth techniques in the LFS458 Kubernetes Administration training. For more, check our Kubernetes Application Development enterprise training for CTOs leading Cloud-Native transformation.

Check our Kubernetes training complete guide to identify the path suited to your profile, or discover training for system administrators and Kubernetes security challenges.

Key Takeaways

Quick Symptom Index

CrashLoopBackOff: Container Restarts in Loop {#crashloopbackoff}

Symptom

Step 1: Examine Crashed Container Logs

Step 2: Analyze Pod Events

Causes and Solutions

ImagePullBackOff: Image Not Found {#imagepullbackoff}

Symptom

Diagnosis

Causes and Solutions

Pending: Pod Not Scheduled {#pending}

Diagnosis

Causes and Solutions

CreateContainerConfigError: Configuration Problem {#createcontainerconfigerror}

Diagnosis

Causes and Solutions

OOMKilled: Memory Exceeded {#oomkilled}

Diagnosis

Solution

Probes Liveness and Readiness: Silent Failures {#probes-liveness-and-readiness}

Diagnosis

Common Errors

Eviction: Pod Expelled from Node {#eviction}

Diagnosis

Universal Diagnostic Toolkit

Quick Diagnostic Script

Prevention: Avoid Errors Before Deployment

Training to Master Kubernetes Troubleshooting