3 Steps to Run Scheduled Jobs on AWS Elastic Container Service

Vasilii Trofimchuk
6 min readFeb 27, 2020

With an ever growing number of microservices within the companies and across the internet, there is often still a need to execute long-running offline tasks. The purpose of such tasks can vary from doing routine checks and cleanups to performing complex data analysis and machine learning on collected data. One of the approaches to manage recurring long-running tasks on AWS is to leverage Elastic Container Service (or ECS).

Why ECS?

There are several alternative approaches to consider before even starting on a path of scheduled ECS tasks. The most prominent being lambda functions and continuously running services. Both of them have limitations that you might need to consider to evaluate their fitness for your problem.

Lambda functions’ limitations include code size limit and execution time limit — read about lambda limitations more here.

As for continuously running services, they will increase your AWS bill by keeping your resources continuously in use. Moreover, you might need to expose and properly secure a network endpoint to centralize the management of your execution schedule outside of individual containers or machines.

Overview of the approach

In order to create a scheduled ECS FARGATE task, you need to complete the following three steps:

  1. Have a docker repository (for ex. ECR) with an image that you want to run
  2. Create an ECS cluster and define a task with the above image
  3. Configure CloudWatch Event Rule to periodically launch the ECS task

Throughout the post, I will describe the infrastructure in the form of terraform configuration with a full project example available on github.

Creating an ECR repository

As an initial step, you will need to create an ECR repository to store your docker images. You can follow the official AWS guide for this purpose. For sample project to work, create a repository with a name scheduled-ecs:

Getting Docker image ready

Once you have your ECR repository ready, let’s create a simple docker image that will do some ficult networking tasks. In the example below, a python script will call httpbin and expect a 200 response. Otherwise, docker will exit with non-zero code.

Python script code:

Dockerfile contents:

Once the docker image is ready, you can try to run it locally to verify that your docker image can successfully call httpbin:

docker build -t scheduled-ecs-task .
docker run scheduled-ecs-task

You should see something like:

# docker run scheduled-ecs-task
Calling httpbin
Successfully completed

Once you built and verified your image, upload it to ECR repository (you can find your ECR repository URI on the ECR screen in AWS console):

eval “$(aws ecr get-login — no-include-email — region us-west-2)”
docker build -t scheduled-ecs .
docker tag scheduled-ecs:latest “YOUR_ECR_REPO_URI:latest”
docker push “YOUR_ECR_REPO_URI:latest”

Defining ECS Cluster, Service and Task

When you successfully published your docker image to the ECR repository, you are ready to create a cluster, task definition and a service. The following terraform configuration describes the necessary infrastructure:

In the above example, you can spot the service configuration. This configuration is not required for a scheduled execution and was rather provided as an example if you will decide to run your task continuously in the future.

The task definition is sourced from the def.json:

Above resources and definitions reference log and network configurations not mentioned in the post. To find related resources see the full sample project on github.

Scheduling an ECS Task

To tell AWS to run ECS task periodically, you can follow the official AWS docs that go through the process of setting up CloudWatch Events along with cron scheduling to kick off ECS tasks.

First, you need to create a CloudWatch rule that will specify the schedule when to trigger a particular CloudWatch event:

Along with rate expression, you can use other cron-like definitions.

Second, create a CloudWatch event target and point it to the ECS cluster, where you want your task to be triggered:

As you see above, the event target configuration includes the ECS target, which, in essence, is a service configuration that Cloud Watch will add to the cluster every time Event Rule is triggered. Meaning Cloud Watch starts up a service inside a cluster, rather than a task inside a service.

You might be wondering why the ECS target’s network configuration requires a public IP. It seems that public IP enables the FARGATE type task to communicate with ECR to pull the image. Otherwise, you will get a CannotPullContainerError / Client.Timeout exceeded while awaiting headers error. As a mitigation for security risk caused by open public IP, you can update the security group to disallow any incoming connections.

Permissions

Historically, both ECS and CloudWatch Events had limited capabilities to debug and troubleshoot setup issues. Especially when it comes to the permissions and security, therefore ensure that (a) your ECS service has necessary permissions to pull the image from ECR (AmazonECSTaskExecutionRolePolicy managed policy) and (b) your CloudWatch Event has permissions to kick-off ECS task. Here is sample policy for CloudWatch Event that includes two required actions ecs:RunTask and iam:PassRole:

Testing

After you have configured ECR, ECS and CloudWatch Events, you should be able to see the execution metrics in the CloudWatch console. Hopefully, you will see lots of successful executions, and none failed:

To find currently running task go to Cluster then “Tasks” tab:

You can also see a history of finished task executions if you toggle “Desired task status” from “Running” to “Stopped”:

Troubleshooting

There are a few reasons why things might not work from the first attempt. Below are a few troubleshooting tips:

ECS cannot pull container an image ECR. In this case, you will the following error on your task page: CannotPullContainerError. To solve this issue, check that your Task execution role has ECR access and also check that your service or CloudWatch rule has Public IP enabled.

CloudWatch Event doesn’t trigger ECS. This is happening most probably due to the misconfiguration in the IAM role that CloudWatch uses. Verify that it has both ecs:RunTask and iam:PassRole permissions.

ECS pulls an image but doesn’t seem to do anything or stops without running the code. As your long-running task might be computationally or memory-intensive, check that you have enough memory and CPU units both in a task and in a container definition. Without enough memory or CPU units, the task can abruptly stop.

References

  1. Ready-made terraform module for deploying scheduled ECS task: https://github.com/dxw/terraform-aws-ecs-scheduled-task

Running Docker images on AWS requires a bit of a know-how and a bit of a luck. Hopefully Docker support will improve in the future to allow 1-click deployments, out-of-box rollbacks and build version tracking. Until then, I hope above information helped you deploy your Docker image to the ECS cluster without unnecessary headache. Thanks!

--

--

Vasilii Trofimchuk

Engineering Lead @ Square, Co-Founder of Sygn — on a journey to create a frustration-free payment experience