Codementor Events

Faster CI/CD pipelines with Docker

Published Apr 22, 2019Last updated Oct 18, 2019
Faster CI/CD pipelines with Docker

The problem

One of the key techniques to moving fast in software development is continuous delivery. However, looking after build agents is a chore. Docker helps with this by keeping the build agent simple and putting all the build tooling inside the Docker image. However, this brings with it a new problem - speed. Building dependencies on a new build agent can take quite a while, especially in Java. This article shows you how to speed up your builds with some Docker features.

What's different about this approach?

Using the layer system in Docker, we can separate dependencies for our code into a different part of the Docker image than the layer in which the code sits. We can also utilise a new feature in Docker that lets us make use of a previously saved image as the layer cache of the current image. This way we speed up the build when only the code (and not the dependencies) have changed.

Tech stack

We'll cover examples in

  • Ruby
  • Python
  • Java
  • JavaScript

All will use Docker and we'll throw in AWS services like CodePipeline, CodeBuild and Elastic Container Registry (ECR) as a simple example of how to get a build pipeline running.

The nitty gritty

Different programming languages have different package dependency management systems:

For the purposes of installing dependencies, they all do about the same thing:

  • look in a configuration file for the dependencies you've declared
  • go and find the packages for those dependencies in the official repository
  • download them to your local system
  • make them available to your code

For example, if I want to use Redux in my JS React app, I can yarn add react-redux and I'll end up with package.json file containing a reference to react-redux (as well as a local install in my node_modules). Since I don't want to rely on people remembering to do these installs correctly when deploying to servers, I don't store all the dependencies in git; I just store the package.json file. If someone else gets my code from GitHub, they can yarn install and they'll get redux (along with all the other stuff in the package.json file).

Here is the issue; downloading and building those dependencies out on a build server often takes a long time. Even if I have them installed from a previous build, it's common to scale up build agents in times of demand and then destroy them later for cost effectiveness. This means my build agents are always new and have to do fresh builds.

Docker to the rescue

Docker has two features that help:

If we take Java as an example, we can use the following Dockerfile.tests to run our tests:

FROM openjdk:8-jdk-alpine
RUN apk update
RUN apk add maven
WORKDIR /opt/code
COPY ./pom.xml .
RUN mvn dependency:go-offline
COPY . .

This means the dependencies come in with RUN mvn dependency:go-offline but the rest of our code comes in with COPY . . (so the dependencies stay in the previous layer).

We can then provide a CI config that uses docker build with --cache-from to ensure we only ever have to run the dependency layer when the dependencies change.

As an example, let's describe a couple of CodeBuild configurations. If we assume we have an ECR repo set up at 999999999.dkr.ecr.ap-southeast-2.amazonaws.com/some-img, we can configure a few steps in CodePipeline:

  • Step 1: get the source from source control (e.g. GitHub)
  • Step 2: use a standard CodeBuild container to build the test docker image
  • Step 3: use the new test image directly in CodeBuild to run the tests

Step 1

Create a GitHub webhook via CodePipeline. This can be done in the console, via the cli or with CloudFormation

Step 2

This is an example buildspec that logs in to ECR, pulls the last test image, builds the new test image using the last one as a cache and pushes the new one to ECR:

phases:
  pre_build:
    commands:
      - $(aws ecr get-login --no-include-email --region ap-southeast-2)
      - docker pull 999999999.dkr.ecr.ap-southeast-2.amazonaws.com/some-img:latest
  build:
    commands:
      - docker build --tag 999999999.dkr.ecr.ap-southeast-2.amazonaws.com/some-img:latest --file Dockerfile-tests --cache-from 999999999.dkr.ecr.ap-southeast-2.amazonaws.com/some-img:latest .
  post_build:
    commands:
      - docker push 999999999.dkr.ecr.ap-southeast-2.amazonaws.com/some-img:latest

Step 3

The buildspec to run the tests in the next CodePipeline step is simple if the step is configured to use the 999999999.dkr.ecr.ap-southeast-2.amazonaws.com/some-img:latest image we just created:

phases:
  build:
    commands:
      - mvn test

The whole process is similar for other languages and package management systems. Just put the package file and install commands before the rest of the code.

Ruby:

COPY ./Gemfile .
RUN bundle install
COPY . .

Python:

COPY ./requirements.txt .
RUN pip install -r requirements.txt
COPY . .

JavaScript:

COPY ./package.json .
RUN yarn install
COPY . .

etc.

Final thoughts and next steps

Doing this saves a lot of maintaining build agents and a lot of time in builds. It works equally well with any other build system that can run docker containers, such as BuildKite. If you need help with your setup, get in touch.

About me

I'm a Principal Engineer, with programming experience in Java, Python, Ruby, JavaScript and C#, with rusty recollection of LabVIEW, C++, VisualBasic and ColdFusion. I've dabbled in Haskell.

I have deep AWS experience and some knowledge in Azure and GCP.


Appendix - Tech mentioned in this post

Docker

Docker is a way to package applications into a container that includes all the files necessary to run the application, including operating system files, but not the operating system kernel. In contrast, a virtual machine (VM) contains a kernel and virtualised hardware interfaces.

Amazon Web Services

AWS is a cloud services provider, where computing power, networking and other services are provided on-demand. It allows for infrastructure as code and helps teams spend time on solving customer problems rather than looking after datacentres.

Discover and read more posts from James Holmes
get started
post commentsBe the first to share your opinion
Chowdari Babu
5 years ago

👍

Show more replies