Kubernetes Autoscaling

Overview

We can scale apps manually by changing replicas in deployments. But what if we want automatic scaling based on demand? Kubernetes provides three main autoscaling mechanisms:

Horizontal Pod Autoscaler (HPA) - Scales number of pod replicas based on CPU/memory usage
Vertical Pod Autoscaler (VPA) - Adjusts CPU/memory requests and limits for containers
Cluster Autoscaler - Adds or removes nodes based on pod resource requests

These help applications handle varying loads without manual work.

Horizontal Pod Autoscaler (HPA)

HPA automatically scales pod replicas in deployments, replica sets, or stateful sets based on CPU usage or other metrics.

Create HPA with kubectl

# Create HPA for deployment (scales 1-10 replicas at 50% CPU)
kubectl autoscale deployment my-app --min=1 --max=10 --cpu-percent=50

# View HPA
kubectl get hpa

# Watch HPA in real-time
kubectl get hpa my-app --watch

# Describe HPA for details
kubectl describe hpa my-app

HPA YAML Example

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

HPA Commands

# Apply HPA manifest
kubectl apply -f hpa.yaml

# Delete HPA
kubectl delete hpa my-app-hpa

# Edit HPA
kubectl edit hpa my-app-hpa

Vertical Pod Autoscaler (VPA)

VPA automatically adjusts CPU and memory requests/limits for containers based on actual usage patterns.

Install VPA

# Clone VPA repository
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/

# Install VPA
./hack/vpa-install.sh

# Verify installation
kubectl get pods -n kube-system | grep vpa

VPA YAML Example

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"  # Auto, Off, Initial
  resourcePolicy:
    containerPolicies:
    - containerName: my-container
      maxAllowed:
        cpu: 1
        memory: 500Mi
      minAllowed:
        cpu: 100m
        memory: 50Mi

VPA Update Modes

Auto - VPA assigns resource requests and restarts pods when needed
Initial - VPA assigns resource requests only when pods are created
Off - VPA only provides recommendations, no automatic updates

VPA Commands

# Apply VPA
kubectl apply -f vpa.yaml

# Get VPA status
kubectl get vpa

# Describe VPA for recommendations
kubectl describe vpa my-app-vpa

# Watch VPA
kubectl get vpa my-app-vpa --watch

When to Use HPA vs VPA

Use HPA when you want to scale number of pod replicas based on demand
Use VPA when you want to adjust resource requests/limits of existing pods
Use both together carefully - they can conflict with each other
Always monitor application behavior and adjust configs as needed

Metrics Server

Metrics Server collects resource usage data from nodes and provides APIs for HPA and VPA to make scaling decisions.

Install Metrics Server

# Install Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Check installation
kubectl get deployment metrics-server -n kube-system

# Wait for it to be ready
kubectl wait --for=condition=available --timeout=300s deployment/metrics-server -n kube-system

View Resource Usage

# View node resource usage
kubectl top nodes

# View pod resource usage
kubectl top pods

# View pod usage in specific namespace
kubectl top pods -n kube-system

# View pod usage with containers
kubectl top pods --containers

Metrics Server vs Other Tools

Metrics Server - Basic CPU/memory metrics for HPA/VPA
Prometheus - Comprehensive monitoring with custom metrics and long-term storage
Grafana - Visualization tool for creating dashboards from Prometheus data

Cluster Autoscaler

Cluster Autoscaler automatically adds or removes nodes based on pod resource requests. Works with cloud providers like AWS, GCP, and Azure.

AWS Example Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0
        name: cluster-autoscaler
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 300Mi
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster

Cluster Autoscaler Commands

# Check cluster autoscaler logs
kubectl logs -f deployment/cluster-autoscaler -n kube-system

# Check cluster autoscaler status
kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml

# View nodes
kubectl get nodes

How Cluster Autoscaler Works

Scale Up - When pods can't be scheduled due to insufficient resources
Scale Down - When nodes are underutilized for a period of time
Node Selection - Uses expander policies (least-waste, most-pods, etc.)

Best Practices

Resource Requests and Limits

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: nginx
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi

General Guidelines

Always set resource requests and limits for containers
Start with conservative scaling policies and adjust based on monitoring
Use HPA for stateless applications
Be careful when combining HPA and VPA
Monitor scaling events and adjust thresholds as needed
Test autoscaling behavior under different load conditions

Troubleshooting

Common HPA Issues

# Check if metrics server is running
kubectl get pods -n kube-system | grep metrics-server

# Check HPA events
kubectl describe hpa my-app-hpa

# Check if pods have resource requests
kubectl describe pod <pod-name>

# Check HPA metrics
kubectl get hpa my-app-hpa -o yaml

Common VPA Issues

# Check VPA components
kubectl get pods -n kube-system | grep vpa

# Check VPA recommendations
kubectl describe vpa my-app-vpa

# Check VPA events
kubectl get events --field-selector involvedObject.name=my-app-vpa

Common Cluster Autoscaler Issues

# Check cluster autoscaler logs
kubectl logs deployment/cluster-autoscaler -n kube-system

# Check pending pods
kubectl get pods --field-selector=status.phase=Pending

# Check node capacity
kubectl describe nodes

Debug Commands

# Check resource usage
kubectl top nodes
kubectl top pods

# Check events
kubectl get events --sort-by=.metadata.creationTimestamp

# Check pod resource requests vs limits
kubectl describe pod <pod-name> | grep -A 10 "Requests:"

Overview​

Horizontal Pod Autoscaler (HPA)​

Create HPA with kubectl​

HPA YAML Example​

HPA Commands​

Vertical Pod Autoscaler (VPA)​

Install VPA​

VPA YAML Example​

VPA Update Modes​

VPA Commands​

When to Use HPA vs VPA​

Metrics Server​

Install Metrics Server​

View Resource Usage​

Metrics Server vs Other Tools​

Cluster Autoscaler​

AWS Example Configuration​

Cluster Autoscaler Commands​

How Cluster Autoscaler Works​

Best Practices​

Resource Requests and Limits​

General Guidelines​

Troubleshooting​

Common HPA Issues​

Common VPA Issues​

Common Cluster Autoscaler Issues​

Debug Commands​