Konstantinos Demiris

Devops Automation Engineer/AWS Solutions Architect in love with Python

Ready...Steady...Scale!

Published Jul 11, 2018

Anyone wanting to deploy highly available apps on this thing called Kubernetes, will eventually find herself wanting some automation in the scaling part.
What better feeling than to know, that when your app becomes popular while you sleep and people flock on your homepage, Kubernetes has your back!

This is where Horizontal Pod Autoscaler is the tool for the job.

As stated in the documentation:

it's job is to automatically scale the number of pods in a replication controller or deployment or replica set, based on the observed CPU utilization.

I get very frustrated when I turn to any official documentation , only to read and get the feeling that it is written for robots.
However, after I grind and finally get a clear view of the concepts, I love to share it with the hope that I'll make the day of someone out there a little easier!

Essentially, HPA gives the ability to your Kubernetes Cluster to monitor the load of your existing pods and determine if there is a need for more of them or not.
This is a great benefit of using Kubernetes. It saves us from overloading individual pods, which can lead to unexpected code behavior and arbitrary faults.

So, now that we know what HPA does, let's see how to deploy one in our cluster!

(Deploying a new cluster is out of the scope of this post [..but will surely be covered in a future one!]. My cluster is deployed inside an Ubuntu Virtualbox VM with kubeadm=1.9.7-00.)

Just a side note...:
I love automation so I am writing a book about it!
It is packed with comprehensive guides on the concepts of CI/CD, implemented with techs like Kubernetes, Helm, Docker, Gitlab, Draft, Scaffold etc.
Do you want to be notified as soon as it is available? Get notified!
Do you know a friend/colleague that will be interested? Share it!

1. Check the kubectl version

vagrant@vagrant:~$ kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.7", GitCommit:"dd5e1a2978fd0b97d9b78e1564398aeea7e7fe92", GitTreeState:"clean", BuildDate:"2018-04-19T00:05:56Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.8", GitCommit:"c138b85178156011dc934c2c9f4837476876fb07", GitTreeState:"clean", BuildDate:"2018-05-21T18:53:18Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

[...check the gif]

As you see, we will use version 1.9.
Since kubernetes 1.8+, HPA by default, uses metrics coming from metrics-server, not heapster. This will be true for any version of Kubernetes in the future.
Note: This is the not limited to HPA. Moving to metrics-server is the general direction for Kubernetes.

2. Deploy the metrics-server

We must first clone the repo and then deploy it to our cluster.

git clone https://github.com/kubernetes-incubator/metrics-server.git
kubectl create -f metrics-server/deploy/1.8+/

3. Play with the HPA

Run a sample service

vagrant@vagrant:~$ kubectl run php-apache --image=k8s.gcr.io/hpa-example --requests=cpu=200m --expose --port=80
service "php-apache" created
deployment "php-apache" created

[...check the gif]

Note that the created deployment has requests set for 200m.

If you use one of your deployments, make sure that you set someting like this in the deployment yaml file:

spec:
    containers:
    - name: ...
    image: ...
    imagePullPolicy: ...
    ports:
    - containerPort: ...
    # Set requests/limits to be able to control it via HorizontalPodAutoscaler
    resources:
        requests:
        memory: 100Mi
        cpu: 100m
        limits:
        memory: 100Mi
        cpu: 100m

HPA does not work if the deployment it is supposed to control does not have requests/limits defined.

Create the HPA

vagrant@vagrant:~$ kubectl autoscale deployment php-apache --cpu-percent=10 --min=1 --max=5
deployment "php-apache" autoscaled

[...check the gif]

Check the HPA's status

vagrant@vagrant:~$ kubectl get hpa
NAME       REFERENCE             TARGETS       MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache <unknown>/10% 1       5       1        26s

[...check the gif]

Don't be thrown away by the <unknown>!
It will become 0% as soon as metrics start coming in from the metrics-server [~30 sec].
Note that we set the load threshold to an unrealistically low percentage (10%) in order to overwhelm it quickly.

Create some load

It is time to see how the HPA reacts to increased load. We are going to start a container that it's only job is to request the php-apache service in an infinite loop.

Note: We need to do this in a different terminal.

vagrant@vagrant:~$ kubectl run -i --tty load-generator --image=busybox /bin/sh
If you dont see a command prompt, try pressing enter.
/ \# while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done
OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK

[...check the gif]

You can also run the above non-interactively like this:
kubectl run load-generator --image=busybox /bin/sh -- '-c' 'while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done'

Let it run for 30 seconds to one minute...and now check the HPA.

vagrant@vagrant:~$ kubectl get hpa
NAME       REFERENCE             TARGETS  MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 125%/10% 1       5       4        9m

[...check the gif]

Wow!!! I think we overdid it! 125%! :]
However, as we can see, the autoscaler kicked in and pumped up the replicas to 4.
Great!

Let's verify it though..

vagrant@vagrant:~$ kubectl get pods
NAME                             READY     STATUS    RESTARTS   AGE
load-generator-5c4d59d5dd-5xtmx  1/1       Running   0          12m
php-apache-7ccc68c5cd-2v92p      1/1       Running   0          1m
php-apache-7ccc68c5cd-4kpvn      1/1       Running   0          1m
php-apache-7ccc68c5cd-bmptd      1/1       Running   0          1m
php-apache-7ccc68c5cd-m6nhn      1/1       Running   0          11m

[...check the gif]

Notice that autoscaling the replicas takes some minutes. There is a time difference between the point where we create the load-generator and the point where we actually see the new pods running. Be patient.

Stop the load

In the terminal where we see the OK!OK!OK!OK!OK!OK!OK!OK!OK! output, we should terminate the load generation by typing <Ctrl> + C.

Check again the HPA

Now that there is no load hitting our service, we should watch the deployment [via the HPA] scale down.

vagrant@vagrant:~$ kubectl get hpa
NAME       REFERENCE             TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/10%  1       5       1        20m

Note: just like the scale out phase, the scale in needs time to calibrate...be patient.

Good job!

... for more on autoscaling, check out this lecture

Now you can, [and should], deploy an HPA for every deployment of your clustered app...and have a good night's sleep

Kubernetes Hpa Docker

Report

Enjoy this post? Give Konstantinos Demiris a like if it's helpful.

Konstantinos Demiris

Devops Automation Engineer/AWS Solutions Architect in love with Python

I started as an all stack web developer and pivoted in QA Automation for nearly two years. Now I manage a testing team while I work mostly on DevOps and Cloud automation. Whatever I get my hands dirty with, I always improvise way...

Discover and read more posts from Konstantinos Demiris

get started