Resolve OOMKilled Errors and Manage Kubernetes Resource Limits

Understanding OOMKilled resource limits in Kubernetes is fundamental for any Backend Kubernetes developer or Full-Stack Kubernetes developer deploying containerized applications. These errors occur when a container exceeds its allocated memory limit, causing the Linux kernel to abruptly terminate it. This guide explains how to diagnose, prevent, and resolve these critical issues.

TL;DR: The OOMKilled error indicates that a container was killed for exceeding memory. The solution involves proper sizing of requests and limits, proactive monitoring, and application code optimization.

To master Kubernetes resource management, take the LFD459 Kubernetes for Application Developers training.

What is the OOMKilled Resource Limits Kubernetes Error?

Definition: OOMKilled (Out Of Memory Killed) is the state assigned to a container terminated by the Linux kernel's OOM Killer when memory consumption exceeds the defined limit.

The OOMKilled resource limits Kubernetes error occurs in two scenarios:

Code	Meaning	Cause
OOMKilled (137)	Container limit exceeded	`limits.memory` too low
OOMKilled (137)	Node pressure	System eviction to protect the node

According to , IT teams spend 34 workdays per year resolving Kubernetes issues, with a significant portion related to resources.

Key takeaway: OOMKilled is not an application crash but a system protection against memory exhaustion.

How to Diagnose an OOMKilled Error?

Accurate diagnosis of memory CPU requests limits Kubernetes requires several complementary commands.

Diagnostic Commands

# Check pod status
kubectl describe pod <pod-name> | grep -A 5 "Last State"

# Examine recent events
kubectl get events --field-selector involvedObject.name=<pod-name> --sort-by='.lastTimestamp'

# View real-time memory metrics
kubectl top pod <pod-name>

# Check defined limits
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].resources}'

Identifying the Root Cause

# Example describe pod output
Last State:     Terminated
Reason:       OOMKilled
Exit Code:    137
Started:      Mon, 28 Feb 2026 10:15:00 +0100
Finished:     Mon, 28 Feb 2026 10:17:23 +0100

Definition: Exit code 137 corresponds to SIGKILL (128 + 9), the signal sent by the OOM Killer.

The monitoring and troubleshooting Kubernetes section details resource monitoring tools.

What is the Difference Between Requests and Limits?

Understanding the distinction between requests and limits is essential for avoiding OOMKilled errors.

Comparison Table

Aspect	Requests	Limits
Role	Minimum guarantee	Maximum allowed
Scheduling	Used for pod placement	Not used
Exceeding	Possible without consequence	Triggers OOMKilled
Recommendation	Average consumption	Maximum peak + margin

Configuration Example

apiVersion: v1
kind: Pod
metadata:
name: app-backend
spec:
containers:
- name: api
image: my-api:1.2.0
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"

Key takeaway: Set requests to the observed average consumption and limits to 1.5x-2x that value.

Consult the guide on Kubernetes deployment failures for other common errors.

How to Properly Size Memory Resources?

Optimal sizing avoids both OOMKilled errors and cluster resource waste.

Sizing Methodology

Measure actual consumption in production
Analyze peaks and baseline
Calculate requests = P50, limits = P99 + 20%
Validate under load with stress tests

# Observe consumption over 24h
kubectl top pod <pod-name> --containers

# With Prometheus (query)
container_memory_usage_bytes{pod="<pod-name>"}

According to the Spectro Cloud State of Kubernetes 2025 report, 80% of organizations run Kubernetes in production with an average of 20+ clusters, making proper sizing critical at scale.

Reference Values by Application Type

Type	Memory Requests	Memory Limits
Lightweight REST API	128Mi	256Mi
Java/JVM Application	512Mi	1Gi
Node.js Service	256Mi	512Mi
Batch Worker	Variable	2x requests

The Kubernetes observability checklist includes resource metrics to monitor.

How to Prevent OOMKilled Resource Limits Kubernetes Errors in Production?

Prevention is more effective than correction. Implement these protective mechanisms.

Configure Prometheus Alerts

# Alert for container near its limit
groups:
- name: kubernetes-resources
rules:
- alert: ContainerMemoryNearLimit
expr: |
(container_memory_usage_bytes / container_spec_memory_limit_bytes) > 0.85
for: 5m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.container }} near its memory limit"

According to Grafana Labs, 75% of teams use Prometheus and Grafana for Kubernetes monitoring.

Implement LimitRanges

apiVersion: v1
kind: LimitRange
metadata:
name: memory-defaults
namespace: production
spec:
limits:
- default:
memory: "512Mi"
defaultRequest:
memory: "256Mi"
type: Container

Definition: A LimitRange defines default values and resource constraints for containers in a namespace.

Key takeaway: Combine LimitRanges and ResourceQuotas to govern resources at the namespace level.

How to Optimize an Application Experiencing OOMKilled Errors?

Sometimes, increasing limits is not enough. Application optimization is necessary.

Optimization Checklist

Action	Impact	Complexity
Analyze memory leaks	High	Medium
Reduce JVM heap size	Medium	Low
Implement streaming	High	High
Optimize DB queries	Medium	Medium

JVM Configuration for Containers

# JVM options to respect Kubernetes limits
java -XX:+UseContainerSupport \
-XX:MaxRAMPercentage=75.0 \
-XX:InitialRAMPercentage=50.0 \
-jar app.jar

Consult the Kubernetes deployment and production section for configuration best practices.

How to Use Vertical Pod Autoscaler to Automatically Adjust Resources?

VPA automates resource sizing based on actual usage.

Installation and Configuration

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-backend
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
memory: "128Mi"
maxAllowed:
memory: "2Gi"

Definition: The Vertical Pod Autoscaler (VPA) automatically adjusts container requests and limits based on observed usage.

The comparison Prometheus vs Datadog helps you choose the right monitoring tool to feed VPA.

Key takeaway: In "Auto" mode, VPA restarts pods to apply new values. Use "Off" to observe recommendations without automatic action.

Advanced Debugging with kubectl and System Tools

For complex cases, access the container directly to analyze memory.

Advanced Diagnostic Commands

# Execute a shell in the container
kubectl exec -it <pod> -- /bin/sh

# Check memory from inside the container
cat /sys/fs/cgroup/memory/memory.limit_in_bytes
cat /sys/fs/cgroup/memory/memory.usage_in_bytes

# Analyze memory-consuming processes
kubectl exec <pod> -- top -o %MEM

# Ephemeral container for debugging
kubectl debug -it <pod> --image=nicolaka/netshoot --target=<container>

The Kubernetes system administrator training covers these debugging techniques in depth.

Also consult the article on debugging CrashLoopBackOff pods for restart loop scenarios.

Take Action: Train in Kubernetes Development

Resource management is a key area of the CKAD exam. With a passing score of 66% in 2 hours, mastery of requests/limits is directly tested.

As notes: "Anybody can learn Kubernetes. With abundant documentation and development tools available online, teaching yourself Kubernetes is very much within reach."

The LFD459 Kubernetes for Application Developers training covers resource management, debugging, and CKAD preparation in 3 days. For an overview of fundamentals, consult Kubernetes Fundamentals.

Additional resources:

Contact our advisors to plan your CKAD training.

Key Takeaways