Now, let’s autoscale this deployment. The Deployment resource needs at least 1 pod replica and can have a maximum of 5 pod replicas while maintaining an average CPU utilization of 25%. Use the following command to create a HorizontalPodAutoscaler resource:
$ kubectl autoscale deployment nginx –cpu-percent=25 –min=1 –max=5
Now that we have the HorizontalPodAutoscaler resource created, we can load test the application using the hey load testing utility preinstalled in Google Cloud Shell. But before you fire the load test, open a duplicate shell session and watch the Deployment resource using the following command:
$ kubectl get deployment nginx -w
Open another duplicate shell session and watch the HorizontalPodAutoscaler resource using the following command:
$ kubectl get hpa nginx -w
Now, in the original window, run the following command to fire a load test:
$ hey -z 120s -c 100 http://34.123.234.57
It will start a load test for 2 minutes, with 10 concurrent users continuously hammering the Service. You will see the following output if you open the window where you’re watching the HorizontalPodAutoscaler resource. As soon as we start firing the load test, the average utilization reaches 46%. The HorizontalPodAutoscaler resource waits for some time, then it increases the replicas, first to 2, then to 4, and finally to 5. When the test is complete, the utilization drops quickly to 27%, 25%, and finally, 0%. When the utilization goes to 0%, the HorizontalPodAutoscaler resource spins down the replicas from 5 to 1 gradually:
$ kubectl get hpa nginx -w | ||||||
NAME | REFERENCE | TARGETS | MINPODS | MAXPODS | REPLICAS | AGE |
nginx | deployment/nginx | <unknown>/25% | 1 | 5 | 1 | 32s |
nginx | deployment/nginx | 46%/25% | 1 | 5 | 1 | 71s |
nginx | deployment/nginx | 46%/25% | 1 | 5 | 2 | 92s |
nginx | deployment/nginx | 92%/25% | 1 | 5 | 4 | 2m2s |
nginx | deployment/nginx | 66%/25% | 1 | 5 | 5 | 2m32s |
nginx | deployment/nginx | 57%/25% | 1 | 5 | 5 | 2m41s |
nginx | deployment/nginx | 27%/25% | 1 | 5 | 5 | 3m11s |
nginx | deployment/nginx | 23%/25% | 1 | 5 | 5 | 3m41s |
nginx | deployment/nginx | 0%/25% | 1 | 5 | 4 | 4m23s |
nginx | deployment/nginx | 0%/25% | 1 | 5 | 2 | 5m53s |
nginx | deployment/nginx | 0%/25% | 1 | 5 | 1 | 6m30s |
Likewise, we will see the replicas of the Deployment changing when the HorizontalPodAutoscaler resource actions the changes:
$ kubectl get deployment nginx -w | ||||
NAME | READY | UP-TO-DATE | AVAILABLE | AGE |
nginx | 1/1 | 1 | 1 | 18s |
nginx | 1/2 | 1 | 1 | 77s |
nginx | 2/2 | 2 | 2 | 79s |
nginx | 2/4 | 2 | 2 | 107s |
nginx | 3/4 | 4 | 3 | 108s |
nginx | 4/4 | 4 | 4 | 109s |
nginx | 4/5 | 4 | 4 | 2m17s |
nginx | 5/5 | 5 | 5 | 2m19s |
nginx | 4/4 | 4 | 4 | 4m23s |
nginx | 2/2 | 2 | 2 | 5m53s |
nginx | 1/1 | 1 | 1 | 6m30s |
Besides CPU and memory, you can use other parameters to scale your workloads, such as network
traffic. You can also use external metrics such as latency and other factors that you can use to decide when to scale your traffic.
Tip
While you should use the HorizontalPodAutoscaler resource with CPU and memory, you should also consider scaling on external metrics such as response time and network latency. That will ensure better reliability as they directly impact customer experience and are crucial to your business.
Till now, we have been dealing with stateless workloads. However, pragmatically speaking, some applications need to save the state. Let’s look at some considerations for managing stateful applications.