Chuniversiteit logomarkChuniversiteit.nl
Living the Pipe Dream

Level up your Dockerfiles with these tips and tricks

It’s easy to write Dockerfiles that work, but also to write Dockerfiles that suck. Here are some tips and tricks for writing better Dockerfiles.

Docker whale has a mildly panicked look on its face when it’s stranded and encounters a human hunter.
Whale, whale, look what we have here?

Less is more

Link

Keep your Dockerfile small, and your Docker images smaller. The less an image contains, the faster it can be downloaded and the less likely it is to contain security vulnerabilities.

There are several ways to keep your images as slim as possible.

Smaller base images

Link

Almost any Docker image that you download from a registry or build yourself is directly or indirectly based on a parent image that includes an operating system like Ubuntu, Debian, or Alpine.

Of these three, Ubuntu is usually the largest, followed by Debian, and Alpine. Ubuntu and Debian are both close to 30MB, while Alpine takes up a mere 3MB. For many applications it doesn’t really matter which operating system (and thus base image) you choose, which automatically makes Alpine the most logical choice.

Having said that, Alpine has been the root cause of so many production outages in the past five years that I nowadays prefer Debian. Even is often about 20MB larger than Alpine, but the difference in size quickly becomes negligible as applications grow larger: those 20MB don’t really matter anymore if your node_modules alone takes up 1GB.

.dockerignore

Link

Many Dockerfiles contain a COPY instruction that copies all files from a host directory to the container:

This is rarely what you actually want. Firstly, the host directory is likely to contain a lot of files that you don’t need in the image, like documentation, test code, and .env files. Secondly, in the case of , you want the image to have its node_modules directory created from scratch, rather than based on whatever version of node_modules you happened to have on the host machine.

To exclude unnecessary files and directories from Docker images, add a .dockerignore file to the root directory of your build context. In this file, you can list everything that you want Docker to ignore, like so:

If you have many files that you want to ignore, or want to be really sure that files won’t accidentally end up in the Docker image, consider the following approach where you first exclude everything, then define an allowlist of files that should be copied into the image:

Multi-stage Dockerfiles

Link

Multi-stage Dockerfiles are another popular way to keep images small. Defining multiple stages, which can be seen as build steps, allows you to selectively copy build artifacts from one stage to another.

The Dockerfile below consists of two stages. The first stage compiles Go source code into a standalone binary at /bin/derp. This stage is based on the golang:latest image, which is almost 300MB in size as it includes everything you need to compile Go applications – and more.

As the compiled artifact can be run as a standalone binary that doesn’t require any of those 300MBs, we add a second stage that literally only contains the /bin/derp binary.

Running things

Link

Most complexity within Dockerfiles can be found in RUN and CMD instructions. There are a few things you can do here that improve your Docker images in subtle ways.

Combining RUNs

Link

Each instruction in a Dockerfile creates a new layer. Increasing the number of layers typically results in a larger image size and longer build time, as each RUN instruction creates an intermediate container. For example, the following snippet creates three layers:

The first two layers add data to the image. Some of that data is removed in the third layer, but the “damage” has already been done as the data has already made it into the image!

It’s much better to combine these into a single RUN instruction:

In this case, only a single layer is created that does not contain any unnecessary data.

Shell versus exec form

Link

The CMD instruction is used to define the default command or arguments for an ENTRYPOINT instruction. CMD instructions can be written in two forms: shell form and exec form.

The shell form is easier to write and is written in the same way as RUN instructions:

This starts a shell process that executes the command. You can therefore use everything that can be used in a regular shell, like variables, pipes, and && operators.

The exec form is defined as a JSON array with double-quoted strings:

Commands expressed using exec form do not support shell-specific features and are more cumbersome to write. However, they are also executed directly. This means that unlike with shell form, the container’s process is capable of receiving SIGTERM (and other) signals, which allows it to terminate gracefully. This makes exec form the recommended format for most use cases.

Speeding things up

Link

Building Dockerfiles can take up a lot of time. This is especially noticeable when your image has a lot of external dependencies or . The build process can be made faster in two different ways.

Carefully creating layers

Link

Although you should generally combine instructions, there are situations where it makes sense to use separate instructions.

Docker image layers are cached and reused in future builds if the instructions and their context (any files and folders that are referenced in the instruction) remain unchanged.

You can take advantage of this caching mechanism by installing application dependencies in a separate COPY and RUN instruction:

This creates an extra layer that Docker can reuse in future builds, as long as your package.json and package-lock.json do not change!

Using caches

Link

Normally when you execute docker build, Docker will re-build your image from scratch. This is good for reproducibility, as you’re guaranteed that the image can still be built in its entirety. However, it’s also very wasteful as you are downloading the same packages over and over again. This takes precious CI time and computing resources that are better spent elsewhere (or not at all).

Cache mounts let you specify caches that persist between builds so that packages only have to be downloaded once, when they are first installed or updated.

To create a cache mount, use the --mount flag together with the RUN instruction: --mount=type=cache,target=<path>. The value of <path> is different for each package manager. For example, stores downloaded packages in /root/.cache/composer by default:

Bonus: debugging builds

Link

If a command unexpectedly fails as part of the Docker image build process, an error message will be displayed on the screen or written in the log. However, any output that was generated in previous build steps is no longer available.

You can pass --progress=plain to the docker build command or set the DOCKER_BUILDKIT environment variable to 1 prior to any docker build commands. It’s generally a good idea to do this in CI builds so that you always have the ability to debug failed builds.