AWS Tutorial: Auto Scaling Docker Containers in Amazon ECS

Published Apr 10, 2017
AWS Tutorial: Auto Scaling Docker Containers in Amazon ECS

In this AWS tutorial, I will demonstrate how you can, in just a few steps, transform a simple containerized web application into AWS ECS Service (referred to as Service) that scales automatically in response to changing demand.

Amazon introduced EC2 Container Service (ECS) in 2015 as a reaction to the rapidly growing popularity of Docker containers and microservices architecture. ECS provides a clustering and orchestration layer for controlling the life-cycle of containerized deployments on EC2 host machines (called ECS Instances in this context). ECS is an alternative to tools such as Kubernetes, Docker Swarm, or Mesos.

Workloads within ECS are handled by Tasks. Each Task has its own Task Definition that describes one or more Docker containers, their port mappings, CPU and memory limits, and other attributes. Each Task must always fit into one ECS Instance. Although it is sometimes seen as a limitation, it can also be viewed as a sort of implicit collocation feature, similar to Kubernetes Pods.

What Is Auto Scaling and Why We Need It

A number of tasks can be grouped into a Service. Setting the number of “Desired Tasks” of a Service allows us to scale applications horizontally across ECS Instances to handle more workload in parallel.

Most applications have to deal with load that changes over time, whether gradually or all of a sudden. Perhaps the application already has an established and fairly predictable traffic pattern. Perhaps the traffic is sensitive to popular stories on social networks and receives sudden crush of visitors a few times a week.

The naive way of dealing with variable traffic is over-provisioning. This approach is naive mainly because of its static nature - it’s really only adding cushions.

To solve the issue, modern cloud infrastructure providers, such as Amazon Web Services, provide us with detailed insight into the utilization of resources and a complete toolkit for setting up effective event-driven Auto Scaling.

Before we start, we need to distinguish between:
Scaling of the number of ECS Instances in a cluster
Scaling in terms of adding or removing Tasks within a Service

While the first type of scaling adjusts the amount of computational resources available to a cluster (shared by all Services in that cluster), the latter scales application’s capabilities to deal with load as it deploys appropriate number of containers across the available pool of resources. These two types of Auto Scaling work hand in hand, each on a separate layer of infrastructure, to deliver smooth operation of applications.

In the following paragraphs I will focus on scaling of the number of Tasks within a Service, assuming we have an Auto Scaling Group configured to take care of the number of EC2 Instances associated with the cluster.

Prepare a Docker Container for Auto Scaling in ECS

As the first step, we must specify CPU and Memory constraints for Docker containers in the Task Definition we will associate with our Service. ECS uses these constraints to limit the amount of resources each container may use, and also to determine overall Service utilization. Based on this utilization, AWS CloudWatch decides whether to scale up, scale down, or remain constant.

task-definition-cpu-memory.pngFig. 1 CPU and Memory constraints in Container Definitions

The following formula illustrates how ECS calculates Service CPU utilization:

Service CPU utilization=(Total CPU units used by tasks in service)100(Total CPU units reserved in task def.)(number of tasks in service)Service\ CPU\ utilization = \frac{{(Total\ CPU\ units\ used\ by\ tasks\ in\ service) * 100}}{{(Total\ CPU\ units\ reserved\ in\ task\ def.) * (number\ of\ tasks\ in\ service)}}

Figuring out the correct values for CPU and Memory constraints for a real world application is usually a matter of experimentation. There’s no size that fits all.

Trigger ECS Service Scaling with CloudWatch Alarms and Scaling Policies

Since Auto Scaling is a reactive method of dealing with changing traffic, scaling only happens in response to changes in observed metrics. That’s why we need to define a pair of CloudWatch Alarms: one that will trigger adding extra Tasks to the Service when the CPU utilization is too high, and another that will trigger removing an existing Task when the Service CPU utilization is low enough.

How to Configure CloudWatch Alarms

Enough theory, let’s open the AWS Console and configure CloudWatch Alarms and Scaling Policies.

In the CloudWatch Alarms console, click the Create Alarm button. We need to select the right metric for our new Alarm. A good metric for scaling a Service up and down is CPUUtilization in the AWS/ECS namespace. We type "ECS CPUUtilization" to the search field to narrow down the list of metrics and select the checkbox next to the ClusterName/ServiceName combination we want to create the Alarm for.

selecting-metric-in-cloudwatch.pngFig 2. Selecting CPUUtilization CloudWatch Metric

Note: Alternatively we can consider using the MemoryUtilization metric, depending on the nature of our application.

After clicking the Next button, we define the Alarm threshold. In our example, we have chosen to fire the alarm every time the average CPUUtilization is higher than 80% for a period of 1 minute. That seems like a good sign that another container would be useful!

configuring-cloudwatch-alarm.pngFig 3. Configuring CloudWatch Alarm

We’ll leave the Actions section empty at this point. Instead, we click Save Changes to create the Alarm, and open AWS ECS Console in another browser tab.

Configure Auto Scaling Policies

In the Service view within AWS ECS Console, click the “Update” button and on the next page, click the “Configure Service Auto Scaling” button. Make sure you specify a reasonable Maximum number of tasks (the hard limit that ECS never exceeds) and click the “Add scaling policy“ button. A dialog, where we give the new policy a name, will open, and we’ll select the previously created Alarm. The last step is to specify the number of tasks to add when the alarm rings. Click “Save” and finally confirm updating the service.

service-auto-scaling.pngFig 4. ECS Service Auto Scaling

Up until now, we have created an Alarm that watches our Service CPU Utilization — once it is higher than the specified threshold, the Alarm calls the new Auto Scaling Policy, which launches an extra Task within our Service.

Analogically, we’ll set an Alarm that triggers scale-down action when the average CPUUtilization is below certain threshold, in our example it’s 10% for 15 minutes.

We need to be careful about having Alarms that are too eager. Every scaling action in ECS takes place only after previous scaling actions have finished and the service has reached a stable state.

Availability Zone-Aware Scheduling

Because the ECS Service Scheduler is Availability Zone-aware, the tasks it deploys are evenly distributed across AZ’s used by our cluster. This makes the infrastructure highly available, keeping it safe from potential zone failure.

Reserve Capacity

As we mentioned above, our cluster runs in an Auto Scaling Group with Alarms configured to adjust the number of ECS Instances in that group.

Deploying Docker containers is fast, especially if we use a smaller number of larger instances instead of a higher number of smaller instances, which is generally recommended. There is a high chance the Docker image for our extra container has already been pulled to this ECS Instance, and the new container startup time will be just a few seconds, likely under a second.

Launching an ECS Instance to accommodate additional Tasks is a different kind of burrito. Several minutes can pass between triggering an Auto Scaling Group scaling event and being able to actually deploy containers to the newly created ECS Instance.

To prevent situations when our cluster is not able to take on more Tasks, it’s worth finding the right Scaling Policy parameters to make sure there’s always enough ECS cluster capacity, even under rapidly changing application load.


Auto Scaling saves compute resources, money, and also energy required to power our infrastructure.

Not only does Auto Scaling promote reasonable usage of resources, it can also represent one of the pillars of robust and resilient architecture, due to its capacity to automatically recreate containers after eventual failures and because it is Availability Zone-aware.

As we saw, Amazon Web Services provide a complete set of tools for setting up reliable Auto Scaling of ECS Services, and even if figuring out the right thresholds and CPU and Memory constraints might require a period of tuning and experimentation, I suggest you consider Auto Scaling as part of your automated setup!

Discover and read more posts from Jaroslav Holub
get started
Enjoy this post?

Leave a like and comment for Jaroslav

Srini N
2 months ago

Thanks for the article.very useful.
I have created Rstudio docker for each user. I have use persitance storage. What is the best way, wen the user logout my system, I want to stop the docker, wen user come back to my system, I have to reprovision the same.
What is the best way.

3 months ago

Thank you Jaroslav for the article. I do have a question about ensuring there is enough ECS capacity. Let’s say I have 1 EC2 instance with my docker containers on them. If I want to autoscale up a container with an alarm, but there is not enough space for it then it should start another EC2 instance and then launch the container … but is that possible? It seems to me that if there isn’t enough capacity then the ECS service will just fail trying to launch another one. If I set an alarm on the EC2 itself to say something like “scale up if memory/cpu reservation > x%” then it would add another EC2 but it would just be empty and costing money until such a time when it is required. However if there was also an EC2 scale down alarm like “scale down if memory/cpu < y%” then the empty instance would just terminate again. How would you automate the autoscaling for the containers and the hosts to solve this, i.e. Scale up a service to add another task, but if there’s no space in the ECS cluster for it then add another EC2 instance and try to add the task again. Thank you :)

Jaroslav Holub
2 months ago

Hi Ehtsham! Scaling of the number of ECS Instances goes hand in hand with services auto-scaling. I think you’re aiming the right direction with your question! You can use alarms to adjust your auto-scaling group size. In reality, this requires solving a few issues and edge cases. For example we usually want to drain instances before we remove from the cluster (described in the link pasted in my previous comment below). Mainly we need to find the right thresholds for adding/removing ECS instances, based on the requirements of our tasks, and the instance type we use.

Aviv Noy
4 months ago

great article, when your scale down policy is triggered, how do you manage to do gracefully shut down to your instances without throw your application during work?

Jaroslav Holub
3 months ago

Hi Aviv, you need to drain the instance before removing it from your cluster. It can be automated, for example using a Lambda function as is described in

Show more replies