Kubernetes Autoscaling: HPA, VPA and Automatic Scaling Explained

Kubernetes autoscaling is the cluster's ability to automatically adjust resources allocated to workloads based on actual demand. The Horizontal Pod Autoscaler (HPA) increases or decreases the number of Pods, while the Vertical Pod Autoscaler (VPA) adjusts CPU and memory requests for each Pod. These mechanisms ensure optimal performance and cost control without manual intervention.

Key takeaway: Kubernetes cluster automatic scaling relies on three pillars: HPA for horizontal scaling, VPA for vertical scaling, and Cluster Autoscaler for nodes. Master these concepts to pass your CKA certification.

This skill is at the core of the LFS458 Kubernetes Administration training.

What is autoscaling in Kubernetes?

Autoscaling refers to all mechanisms allowing your cluster to adapt its resources to workload. Unlike manual scaling where you adjust replicas or resource limits, autoscaling automatically reacts to observed metrics.

According to the Spectro Cloud State of Kubernetes 2025 report, 80% of organizations run Kubernetes in production with an average of 20+ clusters. At this scale, manual scaling becomes impossible. You must automate to maintain service availability.

Kubernetes offers three types of autoscaling:

Type	Target	Action	Use case
HPA	Pods	Adjusts replica count	Variable traffic
VPA	Pods	Modifies requests/limits	Unpredictable workloads
Cluster Autoscaler	Nodes	Adds/removes nodes	Global capacity

To explore Kubernetes deployment and production, you must master these three mechanisms.

Why is automatic scaling critical in production?

Your applications face constant load variations. Without autoscaling, you face two problems:

Under-provisioning: your Pods saturate, latencies explode, your users suffer 503 errors. Your SLA is compromised.

Over-provisioning: according to ScaleOps, 65%+ of workloads consume less than half their requested resources. You're paying for unused resources.

Key takeaway: Kubernetes HPA VPA autoscaling allows you to balance performance and costs. Configure it correctly to avoid resource waste.

If you're preparing for CKA certification as a system administrator, you must know how to configure HPA. This skill explicitly appears in the curriculum. Consult our complete Kubernetes Training guide to structure your preparation.

How does the Horizontal Pod Autoscaler (HPA) work?

The Kubernetes Horizontal Pod Autoscaler monitors your Pods' metrics and adjusts the replica count. Here's how it works:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70

Decode this configuration:

scaleTargetRef: you target the api-server Deployment
minReplicas: 2: even without load, you maintain 2 Pods for high availability
maxReplicas: 10: you limit scaling to control costs
averageUtilization: 70: HPA scales up when CPU usage exceeds 70%

The HPA controller queries the Metrics Server every 15 seconds by default. It calculates the required replica count with this formula:

desiredReplicas = ceil(currentReplicas × (currentMetric / desiredMetric))

To verify your HPA works, run:

kubectl get hpa api-hpa -w

You'll see current metrics and desired replica count. If you encounter problems, consult our article on Kubernetes scaling problems: diagnosis and solutions.

How to configure the Vertical Pod Autoscaler (VPA)?

The VPA adjusts your containers' requests and limits. Unlike HPA, it doesn't modify the Pod count but optimizes resources allocated to each Pod.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 4Gi

Understand VPA modes:

Mode	Behavior	Impact
Off	Recommendations only	No restart
Initial	Applied at startup	New Pods only
Auto	Applied dynamically	Restarts Pods

Key takeaway: VPA in Auto mode restarts your Pods to apply new resources. Ensure your application tolerates restarts before enabling this mode in production.

System administrators preparing for CKA certification must understand when to prefer HPA or VPA. Kubernetes monitoring and troubleshooting helps you collect necessary metrics.

When to use HPA versus VPA?

You must choose the right mechanism for your workload:

Use HPA when:

Your application is stateless and scales horizontally
You handle web traffic with predictable peaks
Your Pods start quickly (< 30 seconds)

Use VPA when:

Your application is stateful or single-instance
You don't know resource needs in advance
Your Pods have long startup times

Combine HPA and VPA with caution. As the ScaleOps guide explains, both can conflict if you scale on the same metrics. Use HPA on custom metrics and VPA on CPU/memory.

For progressive deployments, combine autoscaling with Canary Deployment on Kubernetes strategies.

How to integrate autoscaling in your CI/CD pipeline?

Your autoscaling configuration must be versioned and deployed via GitOps. Here's a recommended workflow:

# Validate your HPA before deployment
kubectl apply --dry-run=client -f hpa.yaml

# Deploy via ArgoCD or Flux
argocd app sync my-application

Integrate load tests in your pipeline to validate scaling thresholds:

# pipeline-ci.yaml (GitHub Actions example)
- name: Load test
run: |
k6 run --vus 100 --duration 5m loadtest.js
kubectl get hpa -o wide

Consult our guide setting up a CI/CD pipeline for Kubernetes to automate your deployments.

To go further with the GitOps approach, explore GitOps and Kubernetes: principles, tools, and implementation and our migration guide to GitOps architecture.

What metrics to monitor for effective autoscaling?

The Metrics Server provides CPU and memory by default. But you can scale on custom metrics:

Metric	Source	Use case
CPU/Memory	Metrics Server	Basic scaling
Requests/second	Prometheus	API Gateway
Queue depth	CloudWatch/Stackdriver	Async workers
Active connections	Custom metrics	WebSockets

According to CNCF Annual Survey 2025, 82% of container users run Kubernetes in production. At this scale, you must finely instrument your applications.

Autoscaling becomes crucial for AI/ML workloads with their variable resource consumption.

Key takeaway: Define your scaling metrics based on your SLOs. CPU scaling doesn't suit all workloads. Measure what matters to your users.

What are common errors to avoid?

Here are pitfalls you must avoid:

1. Forgetting resource requests HPA uses requests to calculate usage. Without defined requests, scaling doesn't work.

# Bad - no requests
resources:
limits:
cpu: 1

# Good - explicit requests
resources:
requests:
cpu: 200m
limits:
cpu: 1

2. Overly aggressive thresholds A targetUtilization at 90% leaves little margin. Prefer 60-70% to absorb peaks.

3. Ignoring cooldown HPA waits 5 minutes before scaling down by default. Adjust --horizontal-pod-autoscaler-downscale-stabilization if needed.

4. HPA/VPA conflicts Never scale HPA and VPA on the same metric. You create an unstable loop.

Take action: train on Kubernetes scaling

Autoscaling is a key skill for any system administrator aiming for CKA certification. According to the Linux Foundation, the CKA exam lasts 2 hours with a 66% passing score. Practical questions include HPA configuration.

As a testimonial notes: "The CKA exam tested practical, useful skills. It wasn't just theory - it matched real-world situations you'd actually run into when working with Kubernetes."

Prepare effectively with SFEIR Institute:

LFS458 Kubernetes Administration: 4 intensive days covering autoscaling, troubleshooting, and CKA preparation
LFD459 Kubernetes for Developers: 3 days to master Deployments, ConfigMaps, and application scaling
Kubernetes Fundamentals: 1 day to discover essential concepts if you're starting

CKA certification is valid for 2 years (Linux Foundation source). With over 104,000 people having taken the exam according to CNCF reports (49% annual growth), this certification remains a major differentiator in the market.

Contact our advisors on our contact page to plan your training and accelerate your CKA preparation.

Key Takeaways