Key Takeaways
- ✓60% of cluster management time is spent on troubleshooting (Spectro Cloud 2025)
- ✓'The 4 failure causes: image errors, resources, configuration, probes'
- ✓Diagnosis resolved in <15 minutes with the right kubectl commands
TL;DR: A Kubernetes deployment failure refers to any situation where your pods don't reach Running status or your rollout remains stuck. The main causes are image errors, insufficient resources, misconfigurations, and probe issues. This guide provides the exact commands to diagnose and resolve each type of failure in less than 15 minutes.
To master these troubleshooting skills, discover the LFS458 Kubernetes Administration training.
Why Your Deployments Fail in 2026
Resolving Kubernetes deployment failures represents a critical skill for any software engineer. According to the Spectro Cloud State of Kubernetes 2025 report, more than 60% of cluster management time is spent on troubleshooting. Even more concerning, IT teams spend an average of 34 working days per year resolving Kubernetes issues.
With 82% of container users now running Kubernetes in production, you must master these diagnostic techniques to maintain your SLAs.
Remember: A deployment failure costs an average of 2-4 hours of productivity. With this guide, you'll reduce that time to under 15 minutes.
Symptom and Quick Solution Index
Before diving into details, identify your symptom in this table to jump directly to the solution:
| Symptom | Pod Status | Probable Cause | Section |
|---|---|---|---|
| Pods won't start | Pending | Insufficient resources | Pending Pods |
| Repeated crashes | CrashLoopBackOff | Application or config error | CrashLoopBackOff |
| Inaccessible image | ImagePullBackOff | Registry or credentials | ImagePullBackOff |
| Stuck rollout | Progressing=False | Probes or resources | Stuck Rollout |
| Killed pod | OOMKilled | Insufficient memory | OOMKilled |
Essential Diagnostic Commands
Before investigating, run these commands to get an overview of your deployment:
# Check deployment status
kubectl rollout status deployment/your-app -n your-namespace
# List pods with detailed status
kubectl get pods -n your-namespace -o wide
# View recent events (sorted by date)
kubectl get events -n your-namespace --sort-by='.lastTimestamp' | tail -20
# Describe the deployment to see conditions
kubectl describe deployment/your-app -n your-namespace
These commands form your basic Kubernetes observability checklist. For more advanced monitoring, see our Prometheus vs Datadog comparison.
Pending Pods: Resources and Scheduling
Symptom
NAME READY STATUS RESTARTS AGE
your-app-7d4f 0/1 Pending 0 5m
Diagnosis
Examine the events to identify why the scheduler isn't placing your pod:
kubectl describe pod your-app-7d4f -n your-namespace | grep -A15 "Events:"
Causes and Solutions
| Event Message | Cause | Your Action |
|---|---|---|
Insufficient cpu | Not enough available CPU | Reduce your requests or add nodes |
Insufficient memory | Not enough memory | Adjust resources.requests.memory |
node(s) had taint | Taints blocking scheduling | Add appropriate tolerations |
no nodes available | No nodes in cluster | Check your nodes with kubectl get nodes |
Solution for Insufficient Resources
# Check your current requests
spec:
containers:
- name: app
resources:
requests:
memory: "128Mi" # Reduce if possible
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
Command to see available capacity:
kubectl describe nodes | grep -A5 "Allocated resources"
Remember: Your requests determine scheduling. If you request 4Gi of RAM but your nodes only have 2Gi available, your pod will stay in Pending indefinitely.
ImagePullBackOff: Registry Issues
Symptom
NAME READY STATUS RESTARTS AGE
your-app-8k2m 0/1 ImagePullBackOff 0 3m
Diagnosis
# See the exact error message
kubectl describe pod your-app-8k2m | grep -A5 "Warning.*Failed"
Causes and Solutions
| Error | Your Diagnosis | Solution |
|---|---|---|
manifest unknown | Non-existent tag | Verify tag with docker pull image:tag |
unauthorized | Missing credentials | Create an imagePullSecret |
connection refused | Inaccessible registry | Test network access to registry |
Create an imagePullSecret
If you're using a private registry, configure your credentials:
kubectl create secret docker-registry my-registry-secret \
--docker-server=your-registry.io \
--docker-username=your-user \
--docker-password=your-password \
-n your-namespace
Then reference it in your deployment:
spec:
imagePullSecrets:
- name: my-registry-secret
Follow containerization best practices to avoid these issues.
Stuck Rollout: Analyze and Unblock
Symptom
Your deployment remains stuck with a rollout rollback deployment Kubernetes that won't progress:
$ kubectl rollout status deployment/your-app
Waiting for deployment "your-app" rollout to finish: 1 old replicas are pending termination...
Diagnosis
# See deployment conditions
kubectl get deployment your-app -o jsonpath='{.status.conditions[*].message}'
# Compare ReplicaSets
kubectl get rs -n your-namespace | grep your-app
Solutions by Cause
Probes too strict: If your new pods fail healthchecks, the rollout never completes.
# Check probes
kubectl get pod your-app-xxx -o jsonpath='{.spec.containers[0].readinessProbe}'
Adjust your probes if necessary:
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30 # Increase if your app starts slowly
periodSeconds: 10
failureThreshold: 3
Emergency rollback if you need to return to the previous version:
# View revision history
kubectl rollout history deployment/your-app
# Rollback to previous revision
kubectl rollout undo deployment/your-app
# Or rollback to specific revision
kubectl rollout undo deployment/your-app --to-revision=2
Remember: According to Mend.io, 67% of organizations have delayed deployments due to Kubernetes security or configuration issues. Test your manifests in a staging environment before production.
OOMKilled: Memory Management
Symptom
$ kubectl describe pod your-app-xxx | grep -i oom
Reason: OOMKilled
Diagnosis
# See current memory consumption
kubectl top pod your-app-xxx
# See configured limits
kubectl get pod your-app-xxx -o jsonpath='{.spec.containers[0].resources.limits.memory}'
Solution
Increase your memory limit or optimize your application:
resources:
requests:
memory: "256Mi"
limits:
memory: "512Mi" # Increase this value
For a Full-Stack Kubernetes developer, understanding resource management is essential. The LFD459 training covers these aspects in detail.
Centralize Your Logs for Effective Diagnosis
To leverage the power of Kubernetes, centralize your logs.
Check our Loki vs Elasticsearch comparison to choose your solution. A complete observability stack with Prometheus and Grafana, adopted by 67% of organizations in production, allows you to detect problems before they impact your users.
# View logs from all pods in a deployment
kubectl logs -l app=your-app --all-containers=true -f
# Logs from previous pods (after a crash)
kubectl logs your-app-xxx --previous
Prevent Deployment Failures
Pre-Deployment Checklist
Validate systematically before each deployment:
# Validate YAML syntax
kubectl apply --dry-run=client -f deployment.yaml
# Test in a staging namespace
kubectl apply -f deployment.yaml -n staging
# Check namespace quotas
kubectl describe quota -n your-namespace
Best Practices
- Configure PodDisruptionBudgets to avoid interruptions during rollouts
- Use appropriate probes for your application (liveness, readiness, startup)
- Define realistic resource requests and limits based on your metrics
- Test your images locally before pushing them
To deepen these practices, check our Kubernetes training complete guide and explore the monitoring and troubleshooting Kubernetes modules.
Develop Your Troubleshooting Skills
A software engineer preparing for the LFS458 Kubernetes Administration training acquires practical skills to diagnose and resolve these problems effectively. System administrator Kubernetes training also provides an excellent foundation.
Take action with SFEIR Institute trainings:
- LFS458 Kubernetes Administration: 4 days to master cluster administration and troubleshooting
- LFD459 Kubernetes for Developers: 3 days to deploy your applications error-free
- Kubernetes Fundamentals: 1 day to discover the basics if you're starting out
Contact our advisors to plan your training and transform your deployment failures into successful deployments.