Imagine you’re the manager of a snack bar at a park. On a sunny day, lots of people come to enjoy the park, and they all want snacks. Now, you have a few workers at your snack bar who make and serve the snacks.
Horizontal Pod autoscaling in Kubernetes is like having magical helpers who adjust the number of snack makers (pods) based on how many people want snacks (traffic).
Here’s how it works:
- Average days: You might only need one or two snack makers on regular days with fewer people. In Kubernetes terms, you have a few pods running your application.
- Busy days: But when it’s a sunny weekend, and everyone rushes to the park, more people want snacks. Your magical helpers (Horizontal Pod autoscaling) notice the increase in demand. They say, “We need more snack makers!” So, more snack makers (pods) are added automatically to handle the rush.
- Scaling down: Once the sun sets and the crowd leaves, you don’t need as many snack makers anymore. Your magical helpers see the decrease in demand and say, “We can have fewer snack makers now.” So, extra snack makers (pods) are removed, saving resources.
- Automatic adjustment: These magical helpers monitor the crowd and adjust the number of snack makers (pods) in real time. When the demand goes up, they deploy more. When it goes down, they remove some.
In the same way, Kubernetes Horizontal pod autoscaling watches how busy your application is. If there’s more traffic (more people wanting your app), it automatically adds more pods. If things quieten down, it scales down the number of pods. This helps your app handle varied traffic without you manually doing everything.
So, Horizontal pod autoscaling is like having magical assistants that ensure your application has the correct number of workers (pods) to handle the crowd (traffic) efficiently.
HorizontalPodAutoscaler is a Kubernetes resource that helps you to update replicas within your ReplicaSet resources based on defined factors, the most common being CPU and memory.
To understand this better, let’s create an nginx Deployment, and this time, we will set the resource limits within the pod. Resource limits are a vital element that enables HorizontalPodAutoscaler resources to function. It relies on the percentage utilization of the limits to decide when to spin up a new replica. We will use the following nginx-autoscale-deployment.yaml manifest under ~/modern-devops/ch6/deployments for this exercise:
…
spec:
replicas: 1
template:
spec:
containers:
name: nginx
image: nginx resources: limits:
cpu: 200m
memory: 200Mi
…
Use the following command to perform a new deployment:
$ kubectl apply -f nginx-autoscale-deployment.yaml
Let’s expose this deployment with a LoadBalancer Service resource and get the external IP:
$ kubectl expose deployment nginx –port 80 –type LoadBalancer
$ kubectl get svc nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
nginx LoadBalancer 10.3.243.225 34.123.234.57 80:30099/TCP