Validating horizontal pod autoscaling on EKS with Gremlin

Scaling is one of the most important functions of a microservices platform. Modern applications must be able to handle changes in demand, and the way to accomplish this is with automatic scaling. Kubernetes provides built-in scalability through its Horizontal Pod Autoscaler (HPA), but in order to make the most of HPA, we need to validate that itโs configured optimally for our workloads.
In this tutorial, weโll show you how to set up HPA on an Amazon EKS cluster and test it using Gremlin.
Before starting this tutorial, youโll need:
This tutorial will show you how to:
First, we need to deploy the Gremlin daemonset to our cluster. The easiest way to do this is with the Gremlin Helm Chart.
Start by adding the Gremlin Helm Chart repository:
1helm repo add gremlin https://helm.gremlin.com/
Next, create a Kubernetes namespace for Gremlin:
1kubectl create namespace gremlin
Before we deploy the daemonset, we need to retrieve our Gremlin team ID and certificate in order to authenticate the daemonset. Log into the Gremlin web app, click on the circular avatar in the top-right corner, select โTeam Settings,โ then select the โConfigurationโ tab.
Make sure to copy your team ID and your secret key. If you're unsure of what your secret key is, you can either reset it, or use certificate-based authentication instead. See the Helm chart documentation for more information.
Assuming secret authentication, use the following command to deploy Gremlin to your cluster. Replace $GREMLIN_TEAM_ID
with your team ID, $GREMLIN_TEAM_SECRET
with your secret, and $GREMLIN_CLUSTER_ID
with the name you want to use for this cluster in the Gremlin web app.
1helm install gremlin gremlin/gremlin --namespace gremlin \2 --set gremlin.hostPID=true \3 --set gremlin.container.driver=containerd-runc \4 --set gremlin.secret.managed=true \5 --set gremlin.secret.type=secret \6 --set gremlin.secret.teamID=$GREMLIN_TEAM_ID \7 --set gremlin.secret.clusterID=$GREMLIN_CLUSTER_ID \8 --set gremlin.secret.teamSecret=$GREMLIN_TEAM_SECRET
Note that this tutorial requires version 2.14.7 or later of the Gremlin daemonset. If you have an earlier version installed, you can upgrade to the latest version by using the following Helm command:
1helm upgrade gremlin gremlin/gremlin -n gremlin
You can confirm that the upgrade worked by running:
1helm history gremlin -n gremlin
The output will be similar to the following:
1REVISION UPDATED ย ย ย ย ย ย ย ย STATUSย ย CHARTย ย ย ย APP VERSION DESCRIPTION231 ย ย ย Sun Marย 8 22:33:00 2020 superseded gremlin-0.1.8 2.12.21ย ย Install complete42 ย ย ย Tue Apr 14 17:55:48 2020 deployedย gremlin-0.1.8 2.12.21ย ย Upgrade complete
The HPA requires metrics collected by the Metrics Server, which aggregates resource data in your cluster. EKS doesnโt deploy the metrics server by default, but you can deploy it by using the following command:
1kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Next, weโll verify that the metrics server was deployed:
1kubectl get deployment metrics-server -n kube-system
This will show output similar to the following:
1NAME ย ย ย ย ย ย READY ย UP-TO-DATE ย AVAILABLE ย AGE2metrics-server ย 1/1 ย ย 1ย ย ย ย ย ย 1 ย ย ย ย ย 2m
To test HPA, weโll deploy a demo application to our cluster. The application weโll use is Online Boutique, an e-commerce website made up of eleven containerized services. Each service runs a different part of the website like the frontend, shopping cart, and ads.
To run it, create a new namespace called onlineboutique
and deploy the manifest file there:
1kubectl create ns onlineboutique2kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/microservices-demo/master/release/kubernetes-manifests.yaml -n onlineboutique
Once the deployment finishes, retrieve the IP address of the frontend service and open it in your web browser:
1kubectl get service frontend-external -n onlineboutique | awk '{print $4}'
1EXTERNAL-IP2http://your-cluster-address:80
Next, letโs configure the HPA for our frontend deployment. The frontend deployment handles all user traffic for the Online Boutique. Weโll create an HPA rule that deploys a new pod if CPU usage exceeds a certain threshold across all frontend pods. We can also set minimum and maximum pod counts to prevent over-provisioning. This command sets our CPU threshold at 50% and limits scaling to a maximum of ten pods:
1kubectl autoscale deployment frontend --cpu-percent=50 --min=1 --max=10 -n onlineboutique
The output will look similar to this:
1horizontalpodautoscaler.autoscaling/frontend autoscaled
We can verify that our HPA rule is active by running this command:
1kubectl get hpa -n onlineboutique
This will show us this output:
1NAME ย ย ย REFERENCE ย ย ย ย ย ย TARGETS ย MINPODS ย MAXPODS ย REPLICAS ย AGE2frontend ย Deployment/frontend ย 2%/50% ย ย 1ย ย ย ย 10ย ย ย ย 1ย ย ย ย ย 48s
The HPA is now active on the frontend deployment. Now when CPU usage increases past 50%, the HPA will add a new Pod (up to ten). When CPU usage falls below 50%, the HPA will remove Pods until we have one left. Next, we should test this rule to validate that our frontend can scale quickly and reliably during periods of elevated demand.
Now that we have our HPA configured, letโs test that itโs configured properly. We want to make sure our frontend automatically scales up (and down) without problems. Using Gremlin, weโll create an experiment to consume CPU resources in the deployment. Our hypothesis is that this will trigger our HPA rule. During the experiment, weโll monitor the number of pods in the deployment and our website to ensure we scale successfully.
Itโs important to note that the frontend deployment has both CPU and memory limits already defined:
1resources:2ย ย requests:3ย ย ย ย cpu: 100m4ย ย ย ย memory: 64Mi5ย ย limits:6ย ย ย ย cpu: 200m7ย ย ย ย memory: 128Mi
This is important for determining the magnitude (the intensity) we should use for our experiment. A Gremlin CPU attack ran on a Kubernetes pod consumes resources based on the podโs requests, not its limits. For example, if we ran an attack consuming 5% on a single core, this would use 50m CPU, which equates to half of our requested capacity. If we consumed 10% on a single core, or 5% on two cores, this would use 100m CPU. When running a CPU attack on all cores, itโs important to know how many cores the host has, as this affects the overall CPU percentage used. For our experiment, weโll consume 5% across two cores, since weโre running our cluster on dual-core machines.
Log into the Gremlin web app, then select Attacks > New Attack. Select Kubernetes, then select the frontend Deployment.
Under Choose a Gremlin, select Resource > CPU. Enter 5 for CPU capacity and set the number of cores to 2. Keep Any Container selected, then click Unleash Gremlin to run the attack.
While the attack is running, letโs monitor our HPA by using the watch command. watch will refresh the output of a console command based on a customizable interval. Here, weโll use it to refresh the output of kubectl every five seconds:
1watch -n 5 'kubectl get hpa -n onlinebouqitue frontend'
The output will look similar to the following:
1Every 5.0s: kubectl get hpa -n microservices-demo ย ย ย ย ย kind-lab: Tue Dec 15 10:13:15 202023NAME ย ย ย REFERENCE ย ย ย ย ย ย TARGETS ย ย ย ย MINPODS ย MAXPODS ย REPLICAS ย AGE4frontend ย Deployment/frontend ย <unknown>/50% ย 1 ย ย ย ย 10ย ย ย ย 1ย ย ย ย ย 2m42s
Shortly, the number of replicas will increase from 1 to 2, indicating that the new pod was deployed. Once this pod is in a running state, we can verify that the frontend is still fully functional by refreshing our application in a web browser, or by monitoring the load balancing service included in the application. If all goes as expected, our website will be just as performant as it was before, and weโll now have a second frontend pod to handle the additional load. Once our experiment stops, we should continue monitoring our HPA to make sure that after CPU usage returns to normal, our deployment scales back down and the now-unused pod gets removed.
Now that youโve validated your HPA rules using Gremlin, consider expanding the scope of your experiments by:
Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.
Get started