Level up your Dockerfiles with these tips and tricks

Published: 30 Dec 2023
Written by: Chun Fei Lung

It’s easy to write Dockerfiles that work, but also to write Dockerfiles that suck. Here are some tips and tricks for writing better Dockerfiles.

Whale, whale, look what we have here?

Less is more

Keep your Dockerfile small, and your Docker images smaller. The less an image contains, the faster it can be downloaded and the less likely it is to contain security vulnerabilities.

There are several ways to keep your images as slim as possible.

Smaller base images

Almost any Docker image that you download from a registry or build yourself is directly or indirectly based on a parent image that includes an operating system like Ubuntu, Debian, or Alpine.

Of these three, Ubuntu is usually the largest, followed by Debian, and Alpine. Ubuntu and Debian are both close to 30MB, while Alpine takes up a mere 3MB. For many applications it doesn’t really matter which operating system (and thus base image) you choose, which automatically makes Alpine the most logical choice.

Having said that, Alpine has been the root cause of so many production outages in the past five years that I nowadays prefer Debian. Even its trimmed version (side note: Images that are based on this version of Debian usually have a tag with a -slim suffix.) is often about 20MB larger than Alpine, but the difference in size quickly becomes negligible as applications grow larger: those 20MB don’t really matter anymore if your node_modules alone takes up 1GB.

# Roughly 380MB (based on debian:12)
FROM node:latest

# Roughly 70MB (based on debian:12-slim)
FROM node:slim

# Roughly 50MB (based on alpine:3)
FROM node:alpine

.dockerignore

Many Dockerfiles contain a COPY instruction that copies all files from a host directory to the container:

FROM node:21-slim

WORKDIR /app

COPY . .

This is rarely what you actually want. Firstly, the host directory is likely to contain a lot of files that you don’t need in the image, like documentation, test code, and .env files. Secondly, in the case of dependencies that are installed in the same directory (side note: e.g. node_modules for Node.js, vendor for PHP), you want the image to have its node_modules directory created from scratch, rather than based on whatever version of node_modules you happened to have on the host machine.

To exclude unnecessary files and directories from Docker images, add a .dockerignore file to the root directory of your build context. In this file, you can list everything that you want Docker to ignore, like so:

node_modules
CODEOWNERS
CONTRIBUTING.md
LICENSE.md
README.md

If you have many files that you want to ignore, or want to be really sure that files won’t accidentally end up in the Docker image, consider the following approach where you first exclude everything, then define an allowlist of files that should be copied into the image:

*
!src
!package.json
!package-lock.json

Multi-stage Dockerfiles

Multi-stage Dockerfiles are another popular way to keep images small. Defining multiple stages, which can be seen as build steps, allows you to selectively copy build artifacts from one stage to another.

The Dockerfile below consists of two stages. The first stage compiles Go source code into a standalone binary at /bin/derp. This stage is based on the golang:latest image, which is almost 300MB in size as it includes everything you need to compile Go applications – and more.

As the compiled artifact can be run as a standalone binary that doesn’t require any of those 300MBs, we add a second stage that literally only contains the /bin/derp binary.

FROM golang:latest AS build
WORKDIR /src
COPY . .
RUN go build -o /bin/derp ./main.go

# ------------------------------------------------------------------------------

FROM scratch
COPY --from=build /bin/derp /bin/derp
CMD ["/bin/derp"]

Running things

Most complexity within Dockerfiles can be found in RUN and CMD instructions. There are a few things you can do here that improve your Docker images in subtle ways.

Combining RUNs

Each instruction in a Dockerfile creates a new layer. Increasing the number of layers typically results in a larger image size and longer build time, as each RUN instruction creates an intermediate container. For example, the following snippet creates three layers:

RUN apt-get update
RUN apt-get install -y antiword
RUN apt-get clean

The first two layers add data to the image. Some of that data is removed in the third layer, but the “damage” has already been done as the data has already made it into the image!

It’s much better to combine these into a single RUN instruction:

RUN apt-get update \
    && apt-get install -y antiword \
    && apt-get clean

In this case, only a single layer is created that does not contain any unnecessary data.

Shell versus exec form

The CMD instruction is used to define the default command or arguments for an ENTRYPOINT instruction. CMD instructions can be written in two forms: shell form and exec form.

The shell form is easier to write and is written in the same way as RUN instructions:

CMD npx webpack serve --host 0.0.0.0

This starts a shell process that executes the command. You can therefore use everything that can be used in a regular shell, like variables, pipes, and && operators.

The exec form is defined as a JSON array with double-quoted strings:

CMD [ "npx", "webpack", "serve", "--host", "0.0.0.0" ]

Commands expressed using exec form do not support shell-specific features and are more cumbersome to write. However, they are also executed directly. This means that unlike with shell form, the container’s process is capable of receiving SIGTERM (and other) signals, which allows it to terminate gracefully. This makes exec form the recommended format for most use cases.

Speeding things up

Building Dockerfiles can take up a lot of time. This is especially noticeable when your image has a lot of external dependencies or needs to compile libraries or extensions from source (side note: PHP!!! 😡). The build process can be made faster in two different ways.

Carefully creating layers

Although you should generally combine instructions, there are situations where it makes sense to use separate instructions.

Docker image layers are cached and reused in future builds if the instructions and their context (any files and folders that are referenced in the instruction) remain unchanged.

You can take advantage of this caching mechanism by installing application dependencies in a separate COPY and RUN instruction:

COPY package.json package-lock.json /app
RUN npm install

COPY . /app

This creates an extra layer that Docker can reuse in future builds, as long as your package.json and package-lock.json do not change!

Using caches

Normally when you execute docker build, Docker will re-build your image from scratch. This is good for reproducibility, as you’re guaranteed that the image can still be built in its entirety. However, it’s also very wasteful as you are downloading the same packages over and over again. This takes precious CI time and computing resources that are better spent elsewhere (or not at all).

Cache mounts let you specify caches that persist between builds so that packages only have to be downloaded once, when they are first installed or updated.

To create a cache mount, use the --mount flag together with the RUN instruction: --mount=type=cache,target=<path>. The value of <path> is different for each package manager. For example, Composer (side note: PHP’s package manager) stores downloaded packages in /root/.cache/composer by default:

FROM php:8.3-fpm

WORKDIR /var/www/html

COPY composer.json composer.lock .

RUN --mount=type=cache,target=/root/.cache/composer \
    composer install

Bonus: debugging builds

If a command unexpectedly fails as part of the Docker image build process, an error message will be displayed on the screen or written in the log. However, any output that was generated in previous build steps is no longer available.

You can pass --progress=plain to the docker build command or set the DOCKER_BUILDKIT environment variable to 1 prior to any docker build commands. It’s generally a good idea to do this in CI builds so that you always have the ability to debug failed builds.

DOCKER_BUILDKIT=1 \
    docker build \
    --progress=plain \
    -t chunfeilung/kubernutty \
    .