AWS Tutorial: Auto Scaling Docker Containers in Amazon ECS

Published Apr 10, 2017
AWS Tutorial: Auto Scaling Docker Containers in Amazon ECS

In this AWS tutorial, I will demonstrate how you can, in just a few steps, transform a simple containerized web application into AWS ECS Service (referred to as Service) that scales automatically in response to changing demand.

Amazon introduced EC2 Container Service (ECS) in 2015 as a reaction to the rapidly growing popularity of Docker containers and microservices architecture. ECS provides a clustering and orchestration layer for controlling the life-cycle of containerized deployments on EC2 host machines (called ECS Instances in this context). ECS is an alternative to tools such as Kubernetes, Docker Swarm, or Mesos.

Workloads within ECS are handled by Tasks. Each Task has its own Task Definition that describes one or more Docker containers, their port mappings, CPU and memory limits, and other attributes. Each Task must always fit into one ECS Instance. Although it is sometimes seen as a limitation, it can also be viewed as a sort of implicit collocation feature, similar to Kubernetes Pods.

What Is Auto Scaling and Why We Need It

A number of tasks can be grouped into a Service. Setting the number of “Desired Tasks” of a Service allows us to scale applications horizontally across ECS Instances to handle more workload in parallel.

Most applications have to deal with load that changes over time, whether gradually or all of a sudden. Perhaps the application already has an established and fairly predictable traffic pattern. Perhaps the traffic is sensitive to popular stories on social networks and receives sudden crush of visitors a few times a week.

The naive way of dealing with variable traffic is over-provisioning. This approach is naive mainly because of its static nature - it’s really only adding cushions.

To solve the issue, modern cloud infrastructure providers, such as Amazon Web Services, provide us with detailed insight into the utilization of resources and a complete toolkit for setting up effective event-driven Auto Scaling.

Before we start, we need to distinguish between:
Scaling of the number of ECS Instances in a cluster
Scaling in terms of adding or removing Tasks within a Service

While the first type of scaling adjusts the amount of computational resources available to a cluster (shared by all Services in that cluster), the latter scales application’s capabilities to deal with load as it deploys appropriate number of containers across the available pool of resources. These two types of Auto Scaling work hand in hand, each on a separate layer of infrastructure, to deliver smooth operation of applications.

In the following paragraphs I will focus on scaling of the number of Tasks within a Service, assuming we have an Auto Scaling Group configured to take care of the number of EC2 Instances associated with the cluster.

Prepare a Docker Container for Auto Scaling in ECS

As the first step, we must specify CPU and Memory constraints for Docker containers in the Task Definition we will associate with our Service. ECS uses these constraints to limit the amount of resources each container may use, and also to determine overall Service utilization. Based on this utilization, AWS CloudWatch decides whether to scale up, scale down, or remain constant.

task-definition-cpu-memory.pngFig. 1 CPU and Memory constraints in Container Definitions

The following formula illustrates how ECS calculates Service CPU utilization:

Service CPU utilization=(Total CPU units used by tasks in service)100(Total CPU units reserved in task def.)(number of tasks in service)Service\ CPU\ utilization = \frac{{(Total\ CPU\ units\ used\ by\ tasks\ in\ service) * 100}}{{(Total\ CPU\ units\ reserved\ in\ task\ def.) * (number\ of\ tasks\ in\ service)}}

Figuring out the correct values for CPU and Memory constraints for a real world application is usually a matter of experimentation. There’s no size that fits all.

Trigger ECS Service Scaling with CloudWatch Alarms and Scaling Policies

Since Auto Scaling is a reactive method of dealing with changing traffic, scaling only happens in response to changes in observed metrics. That’s why we need to define a pair of CloudWatch Alarms: one that will trigger adding extra Tasks to the Service when the CPU utilization is too high, and another that will trigger removing an existing Task when the Service CPU utilization is low enough.

How to Configure CloudWatch Alarms

Enough theory, let’s open the AWS Console and configure CloudWatch Alarms and Scaling Policies.

In the CloudWatch Alarms console, click the Create Alarm button. We need to select the right metric for our new Alarm. A good metric for scaling a Service up and down is CPUUtilization in the AWS/ECS namespace. We type "ECS CPUUtilization" to the search field to narrow down the list of metrics and select the checkbox next to the ClusterName/ServiceName combination we want to create the Alarm for.

selecting-metric-in-cloudwatch.pngFig 2. Selecting CPUUtilization CloudWatch Metric

Note: Alternatively we can consider using the MemoryUtilization metric, depending on the nature of our application.

After clicking the Next button, we define the Alarm threshold. In our example, we have chosen to fire the alarm every time the average CPUUtilization is higher than 80% for a period of 1 minute. That seems like a good sign that another container would be useful!

configuring-cloudwatch-alarm.pngFig 3. Configuring CloudWatch Alarm

We’ll leave the Actions section empty at this point. Instead, we click Save Changes to create the Alarm, and open AWS ECS Console in another browser tab.

Configure Auto Scaling Policies

In the Service view within AWS ECS Console, click the “Update” button and on the next page, click the “Configure Service Auto Scaling” button. Make sure you specify a reasonable Maximum number of tasks (the hard limit that ECS never exceeds) and click the “Add scaling policy“ button. A dialog, where we give the new policy a name, will open, and we’ll select the previously created Alarm. The last step is to specify the number of tasks to add when the alarm rings. Click “Save” and finally confirm updating the service.

service-auto-scaling.pngFig 4. ECS Service Auto Scaling

Up until now, we have created an Alarm that watches our Service CPU Utilization — once it is higher than the specified threshold, the Alarm calls the new Auto Scaling Policy, which launches an extra Task within our Service.

Analogically, we’ll set an Alarm that triggers scale-down action when the average CPUUtilization is below certain threshold, in our example it’s 10% for 15 minutes.

We need to be careful about having Alarms that are too eager. Every scaling action in ECS takes place only after previous scaling actions have finished and the service has reached a stable state.

Availability Zone-Aware Scheduling

Because the ECS Service Scheduler is Availability Zone-aware, the tasks it deploys are evenly distributed across AZ’s used by our cluster. This makes the infrastructure highly available, keeping it safe from potential zone failure.

Reserve Capacity

As we mentioned above, our cluster runs in an Auto Scaling Group with Alarms configured to adjust the number of ECS Instances in that group.

Deploying Docker containers is fast, especially if we use a smaller number of larger instances instead of a higher number of smaller instances, which is generally recommended. There is a high chance the Docker image for our extra container has already been pulled to this ECS Instance, and the new container startup time will be just a few seconds, likely under a second.

Launching an ECS Instance to accommodate additional Tasks is a different kind of burrito. Several minutes can pass between triggering an Auto Scaling Group scaling event and being able to actually deploy containers to the newly created ECS Instance.

To prevent situations when our cluster is not able to take on more Tasks, it’s worth finding the right Scaling Policy parameters to make sure there’s always enough ECS cluster capacity, even under rapidly changing application load.


Auto Scaling saves compute resources, money, and also energy required to power our infrastructure.

Not only does Auto Scaling promote reasonable usage of resources, it can also represent one of the pillars of robust and resilient architecture, due to its capacity to automatically recreate containers after eventual failures and because it is Availability Zone-aware.

As we saw, Amazon Web Services provide a complete set of tools for setting up reliable Auto Scaling of ECS Services, and even if figuring out the right thresholds and CPU and Memory constraints might require a period of tuning and experimentation, I suggest you consider Auto Scaling as part of your automated setup!

Discover and read more posts from Jaroslav Holub
get started