Key Takeaways
- ✓10 best practices cover images, resources, probes, RBAC and network policies
- ✓71% of Fortune 100 companies apply these practices to achieve optimal reliability
Deploying Kubernetes in production? With 82% of container users running Kubernetes in production in 2025, Kubernetes production best practices are no longer optional.
They determine the difference between a stable cluster and 3 AM incidents.
Kubernetes is the container orchestration system that automates deployment, scaling, and management of containerized applications. This guide presents essential recommendations to optimize your clusters, validated by organizations managing an average of 20+ clusters in production.
TL;DR: This checklist covers 10 essential best practices: optimized images, limited resources, configured probes, isolated namespaces, strict RBAC, network policies, centralized monitoring, GitOps, encrypted secrets, and mastered deployment strategies. Apply them systematically to avoid 80% of common incidents.
These skills are at the core of the LFS458 Kubernetes Administration training.
Why are these best practices critical for your production?
What are Kubernetes production best practices? This question guides every DevOps team migrating to cloud-native. According to Spectro Cloud, 80% of organizations now run Kubernetes in production. However, operational complexity remains the major challenge.
Key takeaway: 71% of Fortune 100 companies use Kubernetes in production. These organizations apply rigorous practices you must adopt to reach their reliability level.
Let's now explore each practice in detail. For an overview of fundamental concepts, consult our Kubernetes Training: Complete Guide.
1. Optimize your container images to reduce attack surface
Why it's essential: Bulky images increase deployment times, consume more storage, and expand your attack surface. Every unnecessary binary represents a potential vulnerability you must eliminate.
How to proceed:
- Use minimal base images. An Alpine image weighs ~3MB compared to ~70MB for Ubuntu.
- Apply multi-stage builds. You can reduce your images from 800MB to 15-30MB.
- Target images under 200MB for your microservices.
# Build stage
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o myapp
# Production stage
FROM alpine:3.19
RUN adduser -D -u 1000 appuser
USER appuser
COPY --from=builder /app/myapp /myapp
ENTRYPOINT ["/myapp"]
To deepen this practice, consult our Optimize a Dockerfile for Kubernetes guide.
Key takeaway: Systematically scan your images with Trivy or Grype before each deployment. Integrate this scan into your CI/CD pipeline.
2. Define strict resource requests and limits
Why it's essential: Without resource limits, a failing pod can consume all node resources and impact your other workloads. You risk cascade effects across your entire cluster.
How to proceed:
apiVersion: v1
kind: Pod
metadata:
name: application-prod
spec:
containers:
- name: app
image: myapp:1.2.3
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
Apply these rules:
- Requests = observed average consumption of your application
- Limits = 1.5x to 2x requests to absorb peaks
- Configure LimitRanges per namespace to enforce defaults
Consult our Docker and Kubernetes Cheatsheet for quick diagnostic commands.
3. Configure health probes adapted to your application
Why it's essential: Kubernetes cannot guess if your application is actually working. Probes detect failures and redirect traffic automatically.
How to configure your three probe types:
apiVersion: v1
kind: Pod
metadata:
name: app-with-probes
spec:
containers:
- name: app
image: myapp:1.2.3
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 10
LivenessProbe detects if your container is stuck. ReadinessProbe indicates if you're ready to receive traffic. StartupProbe handles slow-starting applications.
Key takeaway: Never point your livenessProbe to an external dependency (database, third-party API). A dependency timeout should not trigger cascading restarts of your pods.
4. Isolate your workloads with dedicated namespaces
Why it's essential: Namespaces create logical boundaries between your teams, environments, and applications. You can thus apply specific policies to each scope.
How to structure your namespaces:
# Create namespaces by environment and team
kubectl create namespace prod-team-payment
kubectl create namespace prod-team-catalog
kubectl create namespace staging-team-payment
# Apply ResourceQuotas per namespace
kubectl apply -f - <<EOF
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: prod-team-payment
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi
pods: "50"
EOF
This isolation is fundamental to understanding the differences between Kubernetes and Docker in workload management.
5. Implement RBAC with the principle of least privilege
Why it's essential: Overly permissive access exposes your cluster to human errors and compromises. 70% of organizations use Helm, often with excessive permissions.
How to apply RBAC correctly:
# Role limited to reading pods in a namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: prod-team-payment
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list", "watch"]
---
# Binding to a specific ServiceAccount
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods
namespace: prod-team-payment
subjects:
- kind: ServiceAccount
name: monitoring-sa
namespace: prod-team-payment
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
Rules to follow:
- Use Roles (namespaced) rather than ClusterRoles
- Create a dedicated ServiceAccount per application
- Regularly audit permissions with
kubectl auth can-i --list
To deepen Kubernetes security, the LFS460 Kubernetes Security Essentials training covers these aspects in depth.
6. Secure the network with Network Policies
Why it's essential: By default, all pods can communicate with each other. You must explicitly restrict these flows to limit compromise spread.
How to implement zero-trust networking:
# Default policy: deny all ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: prod-team-payment
spec:
podSelector: {}
policyTypes:
- Ingress
---
# Allow only traffic from frontend
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-api
namespace: prod-team-payment
spec:
podSelector:
matchLabels:
app: payment-api
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
Consult our Kubernetes Monitoring and Troubleshooting guide to diagnose network connectivity issues.
Key takeaway: Test your Network Policies before deploying to production. Use kubectl exec to validate that authorized traffic passes and unauthorized traffic is blocked.
7. Centralize your monitoring and logs
Why it's essential: Without centralized observability, you cannot effectively diagnose incidents in a distributed environment. Every minute of MTTR (Mean Time To Resolution) counts.
Recommended stack:
- Prometheus + Grafana for metrics
- Loki or Elasticsearch for logs
- Jaeger or Tempo for distributed tracing
# ServiceMonitor for Prometheus Operator
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: payment-api-monitor
labels:
team: payment
spec:
selector:
matchLabels:
app: payment-api
endpoints:
- port: metrics
interval: 30s
path: /metrics
Essential metrics to monitor:
- Error rates (HTTP 5xx)
- P95 and P99 latency
- CPU/memory usage vs limits
- Pod restart count
To resolve common issues, refer to Docker and Kubernetes Troubleshooting: resolve frequent errors.
8. Adopt GitOps for reproducible deployments
Why it's essential: Manual modifications via kubectl apply create technical debt and environment drift. GitOps ensures your cluster always reflects the declared state in Git.
How to implement GitOps:
# ArgoCD Application
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: payment-service
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/k8s-manifests.git
targetRevision: main
path: apps/payment-service/overlays/prod
destination:
server: https://kubernetes.default.svc
namespace: prod-team-payment
syncPolicy:
automated:
prune: true
selfHeal: true
Key takeaway: Enable automatic reconciliation but keepprune: falseinitially. Switch toprune: trueonly when you master your workflow.
If you're migrating from Docker Compose, our Migrate to Kubernetes from Docker Compose, VMs or monoliths guide accompanies you step by step.
9. Encrypt and manage your secrets correctly
Why it's essential: Kubernetes Secrets are base64-encoded, not encrypted. Without additional protection, they're readable by anyone with access to the API server or etcd.
How to secure your secrets:
# Enable at-rest encryption in the API server
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: <base64-encoded-secret>
- identity: {}
Recommended alternatives:
- External Secrets Operator with AWS Secrets Manager or HashiCorp Vault
- Sealed Secrets to store encrypted secrets in Git
- SOPS for YAML file encryption
# Create a SealedSecret
kubeseal --format=yaml < secret.yaml > sealed-secret.yaml
kubectl apply -f sealed-secret.yaml
10. Master your deployment strategies
Why it's essential: A poorly configured deployment can cause total service unavailability. You must choose the strategy suited to your risk tolerance.
| Strategy | Downtime | Rollback | Complexity | Use case |
|---|---|---|---|---|
| Rolling Update | No | Automatic | Low | Standard |
| Blue-Green | No | Instant | Medium | Critical |
| Canary | No | Progressive | High | High criticality |
| Recreate | Yes | Manual | Very low | Batch jobs |
Optimized rolling update configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-api
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
spec:
terminationGracePeriodSeconds: 60
containers:
- name: api
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
To optimize your deployments, start with rolling update then evolve to canary when your observability allows.
Our Containerization and Docker Best Practices hub deepens each of these strategies.
Anti-patterns to absolutely avoid
Before concluding, here are errors you must absolutely avoid in your production clusters:
| Anti-pattern | Risk | Solution |
|---|---|---|
| No resource limits | Noisy neighbor, OOM kills | Define requests and limits |
| Using :latest | Non-reproducible deployments | Immutable versioned tags |
| Secrets in ConfigMaps | Sensitive data exposure | Secrets + encryption |
| Root pods | Maximum attack surface | SecurityContext non-root |
| No PodDisruptionBudget | Unavailability during maintenance | PDB with minAvailable |
| cluster-admin RBAC everywhere | Maximum blast radius | Namespace-scoped Roles |
Take action: validate your skills
You now master essential recommendations to optimize your Kubernetes clusters in production. Each practice you apply reduces your incident risk and improves your service reliability.
To master these best practices, SFEIR Institute offers certifying paths supervised by practitioners who manage these clusters daily:
- LFS458 Kubernetes Administration: 4 days to prepare CKA certification and master cluster administration
- LFD459 Kubernetes for Application Developers: 3 days for CKAD certification and cloud-native development
- Kubernetes Fundamentals: 1 day to discover essential concepts
As a CTO interviewed by Spectro Cloud points out: "Just given the capabilities that exist with Kubernetes, and the company's desire to consume more AI tools, we will use Kubernetes more in future." - State of Kubernetes 2025
Apply this checklist now and transform your Kubernetes deployments into reliable and secure production infrastructures.