Debug a Pod in CrashLoopBackOff on Kubernetes: Causes and Solutions

Debug pod CrashLoopBackOff Kubernetes is one of the most in-demand troubleshooting skills. According to Komodor State of Kubernetes 2024, CrashLoopBackOff represents 23% of production incidents. This guide details causes, diagnostic methodology, and solutions for each scenario. A Backend developer or software engineer must master these techniques to maintain stable applications.

TL;DR: CrashLoopBackOff means the container starts, crashes, and Kubernetes tries to restart it with exponential backoff. Main causes are: application error, missing configuration, insufficient resources, or image problem. Use kubectl describe and kubectl logs --previous to diagnose.

To master Kubernetes troubleshooting, follow the LFS458 Kubernetes Administration training.

What Exactly is CrashLoopBackOff?

CrashLoopBackOff is a pod state indicating that the main container crashes repeatedly. Kubernetes applies an exponential restart delay (backoff) between attempts: 10s, 20s, 40s, up to a maximum of 5 minutes.

This technical definition hides a frustrating operational reality: the pod never runs long enough to be debugged from inside.

# Identify pods in CrashLoopBackOff
kubectl get pods -A | grep CrashLoopBackOff

# Example output
NAMESPACE   NAME                      READY   STATUS             RESTARTS   AGE
production  checkout-7d4b5c6f9-x2k4n  0/1     CrashLoopBackOff   15         12m

Key takeaway: The RESTARTS counter indicates the number of restarts. A high number (>10) suggests a persistent problem requiring thorough investigation.

How to Debug Pod CrashLoopBackOff Kubernetes: Methodology

The pod error restart troubleshooting methodology follows a systematic 5-step approach.

Step 1: Collect Basic Information

# Complete pod details
kubectl describe pod checkout-7d4b5c6f9-x2k4n -n production

# Key points to examine in output:
# - Events (end of output)
# - State / Last State
# - Exit Code
# - Reason

The exit code often reveals the cause:

Exit Code	Meaning	Probable Cause
0	Success	Container terminated normally (not expected for a server)
1	Application error	Unhandled exception, config error
137	SIGKILL (OOM)	Memory limit exceeded
139	SIGSEGV	Segmentation fault
143	SIGTERM	Graceful termination failed

Step 2: Examine Previous Container Logs

# Previous crash logs
kubectl logs checkout-7d4b5c6f9-x2k4n -n production --previous

# If multiple containers
kubectl logs checkout-7d4b5c6f9-x2k4n -n production -c main --previous

This command retrieves logs from the container before its crash, essential for understanding the error.

Step 3: Analyze Namespace Events

# Events sorted by timestamp
kubectl get events -n production --sort-by='.lastTimestamp' | tail -20

Events reveal scheduling problems, image pulls, or volume mounting issues.

For a global monitoring vision, see the Monitoring and Troubleshooting Kubernetes module.

Main Causes and Solutions for Debug Pod CrashLoopBackOff Kubernetes

Cause 1: Application Error at Startup

The container starts but the application crashes immediately. This is the most common cause (45% of cases according to Komodor).

Symptoms:

Exit Code: 1
Reason: Error

Diagnosis:

# Application logs
kubectl logs checkout-7d4b5c6f9-x2k4n --previous

# Example output
Error: Cannot connect to database at postgres:5432

Solutions:

# 1. Add init containers for dependencies
initContainers:
- name: wait-for-db
image: busybox:1.36
command: ['sh', '-c', 'until nc -z postgres 5432; do sleep 2; done']

# 2. Configure readiness/liveness probes correctly
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 5

Cause 2: Missing Configuration (ConfigMap/Secret)

The container tries to read an environment variable or configuration file that doesn't exist.

Symptoms:

State: Waiting
Reason: CreateContainerConfigError

Diagnosis:

# Check referenced ConfigMaps
kubectl describe pod checkout-7d4b5c6f9-x2k4n | grep -A5 "Environment"

# Verify ConfigMap exists
kubectl get configmap checkout-config -n production

Solutions:

# Make variable optional
env:
- name: DATABASE_URL
valueFrom:
configMapKeyRef:
name: checkout-config
key: database-url
optional: true  # Pod starts even if absent

Key takeaway: Use optional: true for non-critical configurations. Validate required configurations in an init container.

Cause 3: OOMKilled (Memory Exceeded)

The container exceeds its memory limit and is killed by the kernel.

Symptoms:

Exit Code: 137
Reason: OOMKilled
Last State: Terminated

Diagnosis:

# Check memory consumption before crash
kubectl top pod checkout-7d4b5c6f9-x2k4n --containers

# Compare with limits
kubectl get pod checkout-7d4b5c6f9-x2k4n -o jsonpath='{.spec.containers[0].resources}'

Solutions:

resources:
requests:
memory: "512Mi"
limits:
memory: "1Gi"  # Increase if necessary

For a detailed guide, see Resolve OOMKilled errors.

Cause 4: Container Image Problem

The image cannot be pulled or the entrypoint is incorrect.

Symptoms:

State: Waiting
Reason: ImagePullBackOff
# or
Reason: CrashLoopBackOff with Exit Code: 127 (command not found)

Diagnosis:

# Check pull events
kubectl describe pod checkout-7d4b5c6f9-x2k4n | grep -A3 "Events"

# Test locally
docker run --rm myregistry/checkout:v1.2.3 /bin/sh -c "echo test"

Solutions:

# Check imagePullSecret
imagePullSecrets:
- name: registry-credentials

# Fix command/entrypoint
command: ["/app/checkout"]  # Absolute path
args: ["--port=8080"]

Cause 5: Misconfigured Probes

Liveness probes kill the container before it's ready.

Symptoms:

Events:
Liveness probe failed: connection refused
Container checkout-container failed liveness probe, will be restarted

Diagnosis: If your application takes 30 seconds to start, your liveness probe must start at 30 seconds. Aggressive probes are the leading cause of self-inflicted CrashLoopBackOff.

# Check probe timing
kubectl get pod checkout-7d4b5c6f9-x2k4n -o yaml | grep -A10 livenessProbe

Solutions:

livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 60  # Wait for startup
periodSeconds: 10
failureThreshold: 3      # 3 failures before restart

readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5   # Faster than liveness
periodSeconds: 5

Key takeaway: readinessProbe should be faster than livenessProbe. Start with conservative values then optimize.

Advanced Kubernetes Debugging Techniques

Using kubectl debug (Kubernetes 1.25+)

Ephemeral containers allow attaching a debug container to a running or crashed pod.

# Attach debug container
kubectl debug -it checkout-7d4b5c6f9-x2k4n --image=busybox:1.36 --target=checkout

# Debug with network tools
kubectl debug -it checkout-7d4b5c6f9-x2k4n --image=nicolaka/netshoot

Copy Pod for Debugging

# Create copy with modified command
kubectl debug checkout-7d4b5c6f9-x2k4n -it --copy-to=checkout-debug \
--container=checkout -- /bin/sh

# Debug pod remains active for investigation

Examine Container Runtime Logs

# On the node (requires SSH access)
crictl logs <container-id>

# Find container ID
kubectl get pod checkout-7d4b5c6f9-x2k4n -o jsonpath='{.status.containerStatuses[0].containerID}'

Quick Troubleshooting Checklist

Use this checklist for systematic diagnosis:

#!/bin/bash
# debug-crashloop.sh <pod-name> <namespace>

POD=$1
NS=${2:-default}

echo "=== 1. Pod State ==="
kubectl get pod $POD -n $NS

echo "=== 2. Description ==="
kubectl describe pod $POD -n $NS | tail -30

echo "=== 3. Previous Logs ==="
kubectl logs $POD -n $NS --previous --tail=50 2>/dev/null || echo "No previous logs"

echo "=== 4. Events ==="
kubectl get events -n $NS --field-selector involvedObject.name=$POD

echo "=== 5. Resources ==="
kubectl top pod $POD -n $NS --containers 2>/dev/null || echo "Metrics not available"

Also see the guide Resolve Kubernetes deployment failures for a complementary approach.

Preventing CrashLoopBackOff in Production

Configuration Best Practices

apiVersion: apps/v1
kind: Deployment
metadata:
name: checkout
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
spec:
terminationGracePeriodSeconds: 30
containers:
- name: checkout
image: myregistry/checkout:v1.2.3

# Explicit resources
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi

# Well-calibrated probes
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30
periodSeconds: 10

livenessProbe:
httpGet:
path: /health
port: 8080
periodSeconds: 10
failureThreshold: 3

readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
failureThreshold: 3

Key takeaway: startupProbe (K8s 1.20+) replaces initialDelaySeconds for slow-starting applications. It prevents liveness from killing the container during startup.

Proactive Monitoring

Configure alerts before the problem affects users:

# PrometheusRule
- alert: PodCrashLooping
expr: |
increase(kube_pod_container_status_restarts_total[1h]) > 5
for: 5m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} in CrashLoop"

The Kubernetes observability checklist in production details these configurations.

Network Issues Causing Crashes

Network issues can cause indirect CrashLoopBackOff (application that times out and crashes).

Symptoms:

Logs showing connection timeouts
Exit code 1 after delay

Diagnosis:

# From a debug pod
kubectl run debug --rm -it --image=nicolaka/netshoot -- /bin/bash

# Network tests
nslookup kubernetes.default
curl -v http://checkout-service.production.svc.cluster.local:8080/health

See the guide Network problems diagnosis and resolution for more detail.

When to Escalate and Ask for Help

Some CrashLoopBackOff situations require advanced expertise:

Exit code 139 (SIGSEGV): memory bug in application, requires profiling
Intermittent problems: may indicate race conditions or node issues
After cluster update: possible API incompatibilities

Kubernetes deployment and production covers rollback strategies for problematic deployments.

Trainings to Master Kubernetes Troubleshooting

Pod error restart troubleshooting is a key skill evaluated in CKA and CKAD certifications.

To develop your debugging expertise:

LFS458 Kubernetes Administration: advanced troubleshooting for administrators (4 days, CKA preparation)
LFD459 Kubernetes for Developers: application debugging and logs (3 days, CKAD preparation)
Kubernetes Fundamentals: introduction to pod debugging (1 day)

Check upcoming sessions or contact us for a custom path.

Key Takeaways

What Exactly is CrashLoopBackOff?

How to Debug Pod CrashLoopBackOff Kubernetes: Methodology

Step 1: Collect Basic Information

Step 2: Examine Previous Container Logs

Step 3: Analyze Namespace Events

Main Causes and Solutions for Debug Pod CrashLoopBackOff Kubernetes

Cause 1: Application Error at Startup

Cause 2: Missing Configuration (ConfigMap/Secret)

Cause 3: OOMKilled (Memory Exceeded)

Cause 4: Container Image Problem

Cause 5: Misconfigured Probes

Advanced Kubernetes Debugging Techniques

Using kubectl debug (Kubernetes 1.25+)

Copy Pod for Debugging

Examine Container Runtime Logs

Quick Troubleshooting Checklist

Preventing CrashLoopBackOff in Production

Configuration Best Practices

Proactive Monitoring

Network Issues Causing Crashes

When to Escalate and Ask for Help

Trainings to Master Kubernetes Troubleshooting