Key Takeaways
- ✓80% of organizations run K8s in production with 20+ clusters (Spectro Cloud 2025).
- ✓Three main causes: misconfigured HPA/VPA, insufficient resources, missing metrics.
- ✓Diagnosis starts with `kubectl describe hpa` and cluster event analysis.
Automatic scaling in Kubernetes can fail for multiple reasons: resource limits, misconfigured metrics, or infrastructure constraints. For any Cloud Operations engineer pursuing Kubernetes CKS certification, knowing how to diagnose and resolve these Kubernetes scaling problems is a critical production skill.
TL;DR
Scaling problems fall into three categories: incorrect HPA/VPA configuration, insufficient cluster resources, and missing metrics. Diagnosis starts with kubectl describe hpa and cluster event analysis.
To master these skills, explore the LFS458 Kubernetes Administration training.
According to the Spectro Cloud State of Kubernetes 2025 report, 80% of organizations run Kubernetes in production with an average of 20+ clusters. At this scale, scaling problems directly impact service availability.
Why Isn't HPA Scaling? Diagnosis for Cloud Operations Engineers with Kubernetes CKS Certification
The Horizontal Pod Autoscaler (HPA) is a controller that automatically adjusts the number of replicas in a Deployment based on observed metrics. When it stops working, several causes are possible.
Check the HPA status:
kubectl describe hpa my-app -n production
Typical output for a failing HPA:
Name: my-app
Namespace: production
Reference: Deployment/my-app
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 80%
Min replicas: 2
Max replicas: 10
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count
Key takeaway: in metrics means the metrics-server isn't collecting data. Verify that metrics-server is deployed and functional.
kubectl get pods -n kube-system | grep metrics-server
kubectl top pods -n production
The Kubernetes tutorials and practical guides detail the complete metrics-server configuration.
Are Resource Requests Defined Correctly?
HPA calculates CPU/memory usage as a percentage of requests. Without defined requests, the calculation is impossible.
Incorrect configuration:
# Incorrect - No requests = HPA cannot calculate
spec:
containers:
- name: app
image: my-app:v1
resources:
limits:
cpu: "1"
memory: "512Mi"
Correct configuration:
# Correct - Requests defined = HPA works
spec:
containers:
- name: app
image: my-app:v1
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "1"
memory: "512Mi"
Audit your Deployments:
kubectl get deployments -A -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}: requests={.spec.template.spec.containers[0].resources.requests}{"\n"}{end}'
As TealHQ recommends in their Kubernetes DevOps guide: "Don't let your knowledge remain theoretical - set up a real Kubernetes environment to solidify your skills."
How to Resolve Custom Metrics Issues?
For scaling on application metrics (requests/second, latency), you must configure a custom metrics adapter.
| Metric Type | Source | Required Adapter |
|---|---|---|
| CPU/Memory | kubelet | metrics-server (native) |
| Custom metrics | Prometheus | prometheus-adapter |
| External metrics | Datadog, CloudWatch | External metrics adapter |
Deploy prometheus-adapter:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--set prometheus.url=http://prometheus-server.monitoring.svc
Configure a custom metric rule:
# prometheus-adapter-config.yaml
rules:
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'
Key takeaway: Custom metrics require explicit adapter configuration. Without a corresponding rule, HPA displays FailedGetExternalMetric.
The Kubernetes deployment FAQ answers common questions about this configuration.
Does the Cluster Have Enough Resources to Scale?
Even with a correctly configured HPA, scaling fails if the cluster lacks capacity.
Identify Pending Pods:
kubectl get pods -A | grep Pending
kubectl describe pod <pending-pod> -n <namespace>
Typical message:
Events:
Type Reason Message
---- ------ -------
Warning FailedScheduling 0/5 nodes are available: 5 Insufficient cpu.
Analyze cluster capacity:
kubectl top nodes
kubectl describe nodes | grep -A 5 "Allocated resources"
Solutions:
- Add nodes via Cluster Autoscaler
- Optimize requests: often oversized
- Use priorities:
PriorityClassfor critical workloads
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "Critical production workloads"
Multi-environment Kubernetes management explains how to isolate resources by environment.
How to Diagnose VPA Problems? Cloud Operations Engineer Approach for Kubernetes CKS Certification
The Vertical Pod Autoscaler (VPA) adjusts requests/limits instead of replica count. Its problems differ from HPA.
Install VPA:
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh
Check VPA recommendations:
kubectl describe vpa my-app-vpa -n production
Common issues:
| Symptom | Cause | Solution |
|---|---|---|
| No recommendations | Insufficient history | Wait minimum 24h |
| Pods restarting in loop | UpdateMode: Auto | Switch to Off or Initial mode |
| OOMKilled after VPA | Memory recommendation too low | Set higher minAllowed |
VPA configuration with guardrails:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "2"
memory: "2Gi"
Key takeaway: Never combine CPU HPA and VPA on the same Deployment. Use HPA on custom metrics if you enable VPA.
For deeper understanding, consult the Kubernetes system administrator training.
How to Optimize Cluster Scaling with Cluster Autoscaler?
Cluster Autoscaler adds or removes nodes based on Pod demand. Its proper functioning is critical for horizontal scaling.
Check Cluster Autoscaler logs:
kubectl logs -n kube-system -l app=cluster-autoscaler --tail=100
Frequent errors:
scale_up: group my-node-pool max size reached
Adjust node pool limits (GKE example):
gcloud container clusters update my-cluster \
--enable-autoscaling \
--min-nodes=3 \
--max-nodes=50 \
--node-pool=default-pool
According to the CNCF Annual Survey 2025 report, 82% of container users run Kubernetes in production. Autoscaling has become a standard.
The resolving Kubernetes deployment errors section covers other troubleshooting scenarios.
How to Debug Scaling Issues Related to PodDisruptionBudgets?
A too restrictive PDB can block scaling by preventing Pod eviction.
Identify PDBs:
kubectl get pdb -A
kubectl describe pdb my-app-pdb -n production
Problematic configuration:
# Incorrect - Blocks any eviction if 3 replicas or fewer
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
minAvailable: 3
selector:
matchLabels:
app: my-app
Balanced configuration:
# Correct - Allows eviction of one Pod at a time
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: my-app
Key takeaway: UsemaxUnavailablerather thanminAvailableto avoid blockages during scale-down.
To understand the impact on deployments, consult migrating to a GitOps architecture.
Kubernetes Scaling Diagnostic Checklist
Run this sequence when facing a scaling problem:
# 1. HPA status
kubectl get hpa -A
kubectl describe hpa <name> -n <namespace>
# 2. Metrics-server functional
kubectl top pods -n <namespace>
kubectl top nodes
# 3. Cluster capacity
kubectl describe nodes | grep -E "(Allocatable|Allocated)"
# 4. Pending pods
kubectl get pods -A --field-selector=status.phase=Pending
# 5. Recent events
kubectl get events -A --sort-by='.lastTimestamp' | tail -20
# 6. Cluster Autoscaler logs
kubectl logs -n kube-system -l app=cluster-autoscaler --tail=50
The CI/CD pipeline for Kubernetes integrates these checks into deployment tests.
Take Action: Master Kubernetes Scaling
As a CTO interviewed by Spectro Cloud stated: "Just given the capabilities that exist with Kubernetes, and the company's desire to consume more AI tools, we will use Kubernetes more in future." Mastering scaling is essential to support this growth.
To develop these skills:
- The LFS458 Kubernetes Administration training covers autoscaling and troubleshooting over 4 days
- The LFS460 Kubernetes Security training prepares for CKS certification
- For a complete introduction, discover Kubernetes Fundamentals
Contact us for personalized guidance.