troubleshooting6 min read

Kubernetes Scaling Problems: Diagnosis and Solutions

SFEIR Institute

Key Takeaways

  • 80% of organizations run K8s in production with 20+ clusters (Spectro Cloud 2025).
  • Three main causes: misconfigured HPA/VPA, insufficient resources, missing metrics.
  • Diagnosis starts with `kubectl describe hpa` and cluster event analysis.

Automatic scaling in Kubernetes can fail for multiple reasons: resource limits, misconfigured metrics, or infrastructure constraints. For any Cloud Operations engineer pursuing Kubernetes CKS certification, knowing how to diagnose and resolve these Kubernetes scaling problems is a critical production skill.

TL;DR

Scaling problems fall into three categories: incorrect HPA/VPA configuration, insufficient cluster resources, and missing metrics. Diagnosis starts with kubectl describe hpa and cluster event analysis.

To master these skills, explore the LFS458 Kubernetes Administration training.

According to the Spectro Cloud State of Kubernetes 2025 report, 80% of organizations run Kubernetes in production with an average of 20+ clusters. At this scale, scaling problems directly impact service availability.

Why Isn't HPA Scaling? Diagnosis for Cloud Operations Engineers with Kubernetes CKS Certification

The Horizontal Pod Autoscaler (HPA) is a controller that automatically adjusts the number of replicas in a Deployment based on observed metrics. When it stops working, several causes are possible.

Check the HPA status:

kubectl describe hpa my-app -n production

Typical output for a failing HPA:

Name:                                                  my-app
Namespace:                                             production
Reference:                                             Deployment/my-app
Metrics:                                               ( current / target )
resource cpu on pods  (as a percentage of request):  <unknown> / 80%
Min replicas:                                          2
Max replicas:                                          10
Conditions:
Type            Status  Reason                   Message
----            ------  ------                   -------
AbleToScale     True    ReadyForNewScale         recommended size matches current size
ScalingActive   False   FailedGetResourceMetric  the HPA was unable to compute the replica count
Key takeaway: in metrics means the metrics-server isn't collecting data. Verify that metrics-server is deployed and functional.
kubectl get pods -n kube-system | grep metrics-server
kubectl top pods -n production

The Kubernetes tutorials and practical guides detail the complete metrics-server configuration.

Are Resource Requests Defined Correctly?

HPA calculates CPU/memory usage as a percentage of requests. Without defined requests, the calculation is impossible.

Incorrect configuration:

# Incorrect - No requests = HPA cannot calculate
spec:
containers:
- name: app
image: my-app:v1
resources:
limits:
cpu: "1"
memory: "512Mi"

Correct configuration:

# Correct - Requests defined = HPA works
spec:
containers:
- name: app
image: my-app:v1
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "1"
memory: "512Mi"

Audit your Deployments:

kubectl get deployments -A -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}: requests={.spec.template.spec.containers[0].resources.requests}{"\n"}{end}'

As TealHQ recommends in their Kubernetes DevOps guide: "Don't let your knowledge remain theoretical - set up a real Kubernetes environment to solidify your skills."

How to Resolve Custom Metrics Issues?

For scaling on application metrics (requests/second, latency), you must configure a custom metrics adapter.

Metric TypeSourceRequired Adapter
CPU/Memorykubeletmetrics-server (native)
Custom metricsPrometheusprometheus-adapter
External metricsDatadog, CloudWatchExternal metrics adapter

Deploy prometheus-adapter:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--set prometheus.url=http://prometheus-server.monitoring.svc

Configure a custom metric rule:

# prometheus-adapter-config.yaml
rules:
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'
Key takeaway: Custom metrics require explicit adapter configuration. Without a corresponding rule, HPA displays FailedGetExternalMetric.

The Kubernetes deployment FAQ answers common questions about this configuration.

Does the Cluster Have Enough Resources to Scale?

Even with a correctly configured HPA, scaling fails if the cluster lacks capacity.

Identify Pending Pods:

kubectl get pods -A | grep Pending
kubectl describe pod <pending-pod> -n <namespace>

Typical message:

Events:
Type     Reason            Message
----     ------            -------
Warning  FailedScheduling  0/5 nodes are available: 5 Insufficient cpu.

Analyze cluster capacity:

kubectl top nodes
kubectl describe nodes | grep -A 5 "Allocated resources"

Solutions:

  1. Add nodes via Cluster Autoscaler
  2. Optimize requests: often oversized
  3. Use priorities: PriorityClass for critical workloads
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "Critical production workloads"

Multi-environment Kubernetes management explains how to isolate resources by environment.

How to Diagnose VPA Problems? Cloud Operations Engineer Approach for Kubernetes CKS Certification

The Vertical Pod Autoscaler (VPA) adjusts requests/limits instead of replica count. Its problems differ from HPA.

Install VPA:

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

Check VPA recommendations:

kubectl describe vpa my-app-vpa -n production

Common issues:

SymptomCauseSolution
No recommendationsInsufficient historyWait minimum 24h
Pods restarting in loopUpdateMode: AutoSwitch to Off or Initial mode
OOMKilled after VPAMemory recommendation too lowSet higher minAllowed

VPA configuration with guardrails:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "2"
memory: "2Gi"
Key takeaway: Never combine CPU HPA and VPA on the same Deployment. Use HPA on custom metrics if you enable VPA.

For deeper understanding, consult the Kubernetes system administrator training.

How to Optimize Cluster Scaling with Cluster Autoscaler?

Cluster Autoscaler adds or removes nodes based on Pod demand. Its proper functioning is critical for horizontal scaling.

Check Cluster Autoscaler logs:

kubectl logs -n kube-system -l app=cluster-autoscaler --tail=100

Frequent errors:

scale_up: group my-node-pool max size reached

Adjust node pool limits (GKE example):

gcloud container clusters update my-cluster \
--enable-autoscaling \
--min-nodes=3 \
--max-nodes=50 \
--node-pool=default-pool

According to the CNCF Annual Survey 2025 report, 82% of container users run Kubernetes in production. Autoscaling has become a standard.

The resolving Kubernetes deployment errors section covers other troubleshooting scenarios.

A too restrictive PDB can block scaling by preventing Pod eviction.

Identify PDBs:

kubectl get pdb -A
kubectl describe pdb my-app-pdb -n production

Problematic configuration:

# Incorrect - Blocks any eviction if 3 replicas or fewer
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
minAvailable: 3
selector:
matchLabels:
app: my-app

Balanced configuration:

# Correct - Allows eviction of one Pod at a time
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: my-app
Key takeaway: Use maxUnavailable rather than minAvailable to avoid blockages during scale-down.

To understand the impact on deployments, consult migrating to a GitOps architecture.

Kubernetes Scaling Diagnostic Checklist

Run this sequence when facing a scaling problem:

# 1. HPA status
kubectl get hpa -A
kubectl describe hpa <name> -n <namespace>

# 2. Metrics-server functional
kubectl top pods -n <namespace>
kubectl top nodes

# 3. Cluster capacity
kubectl describe nodes | grep -E "(Allocatable|Allocated)"

# 4. Pending pods
kubectl get pods -A --field-selector=status.phase=Pending

# 5. Recent events
kubectl get events -A --sort-by='.lastTimestamp' | tail -20

# 6. Cluster Autoscaler logs
kubectl logs -n kube-system -l app=cluster-autoscaler --tail=50

The CI/CD pipeline for Kubernetes integrates these checks into deployment tests.

Take Action: Master Kubernetes Scaling

As a CTO interviewed by Spectro Cloud stated: "Just given the capabilities that exist with Kubernetes, and the company's desire to consume more AI tools, we will use Kubernetes more in future." Mastering scaling is essential to support this growth.

To develop these skills:

Contact us for personalized guidance.