Level up your Dockerfiles with these tips and tricks
Keep your Dockerfile small, and your Docker images smaller. The less an image contains, the faster it can be downloaded and the less likely it is to contain security vulnerabilities.
There are several ways to keep your images as slim as possible.
Almost any Docker image that you download from a registry or build yourself is directly or indirectly based on a parent image that includes an operating system like Ubuntu, Debian, or Alpine.
Of these three, Ubuntu is usually the largest, followed by Debian, and Alpine. Ubuntu and Debian are both close to 30MB, while Alpine takes up a mere 3MB. For many applications it doesn’t really matter which operating system (and thus base image) you choose, which automatically makes Alpine the most logical choice.
Having said that, Alpine has been the root cause of so many production outages
in the past five years that I nowadays prefer Debian. Even is often about 20MB larger than
Alpine, but the difference in size quickly becomes negligible as applications
grow larger: those 20MB don’t really matter anymore if your node_modules
alone
takes up 1GB.
Many Dockerfile
s contain a COPY
instruction that copies all files from a
host directory to the container:
This is rarely what you actually want. Firstly, the host directory is likely
to contain a lot of files that you don’t need in the image, like documentation,
test code, and .env
files. Secondly, in the case of , you want the image to have its node_modules
directory created
from scratch, rather than based on whatever version of node_modules
you
happened to have on the host machine.
To exclude unnecessary files and directories from Docker images, add a
.dockerignore
file to the root directory of your build context. In this file,
you can list everything that you want Docker to ignore, like so:
If you have many files that you want to ignore, or want to be really sure that files won’t accidentally end up in the Docker image, consider the following approach where you first exclude everything, then define an allowlist of files that should be copied into the image:
Multi-stage Dockerfiles are another popular way to keep images small. Defining multiple stages, which can be seen as build steps, allows you to selectively copy build artifacts from one stage to another.
The Dockerfile below consists of two stages. The first stage compiles Go source
code into a standalone binary at /bin/derp
. This stage is based on the
golang:latest
image, which is almost 300MB in size as it includes everything
you need to compile Go applications – and more.
As the compiled artifact can be run as a standalone binary that doesn’t require
any of those 300MBs, we add a second stage that literally only contains the
/bin/derp
binary.
Most complexity within Dockerfiles can be found in RUN and CMD instructions. There are a few things you can do here that improve your Docker images in subtle ways.
Each instruction in a Dockerfile creates a new layer. Increasing the number of layers typically results in a larger image size and longer build time, as each RUN instruction creates an intermediate container. For example, the following snippet creates three layers:
The first two layers add data to the image. Some of that data is removed in the third layer, but the “damage” has already been done as the data has already made it into the image!
It’s much better to combine these into a single RUN instruction:
In this case, only a single layer is created that does not contain any unnecessary data.
The CMD
instruction is used to define the default command or arguments for an
ENTRYPOINT
instruction. CMD
instructions can be written in two forms: shell
form and exec form.
The shell form is easier to write and is written in the same way as RUN
instructions:
This starts a shell process that executes the command. You can therefore use
everything that can be used in a regular shell, like variables, pipes, and &&
operators.
The exec form is defined as a JSON array with double-quoted strings:
Commands expressed using exec form do not support shell-specific features and
are more cumbersome to write. However, they are also executed directly. This
means that unlike with shell form, the container’s process is capable of
receiving SIGTERM
(and other) signals, which allows it to terminate gracefully.
This makes exec form the recommended format for most use cases.
Building Dockerfiles can take up a lot of time. This is especially noticeable when your image has a lot of external dependencies or . The build process can be made faster in two different ways.
Although you should generally combine instructions, there are situations where it makes sense to use separate instructions.
Docker image layers are cached and reused in future builds if the instructions and their context (any files and folders that are referenced in the instruction) remain unchanged.
You can take advantage of this caching mechanism by installing application dependencies in a separate COPY and RUN instruction:
This creates an extra layer that Docker can reuse in future builds, as long as
your package.json
and package-lock.json
do not change!
Normally when you execute docker build
, Docker will re-build your image from
scratch. This is good for reproducibility, as you’re guaranteed that the image
can still be built in its entirety. However, it’s also very wasteful as you are
downloading the same packages over and over again. This takes precious CI time
and computing resources that are better spent elsewhere (or not at all).
Cache mounts let you specify caches that persist between builds so that packages only have to be downloaded once, when they are first installed or updated.
To create a cache mount, use the --mount
flag together with the RUN
instruction: --mount=type=cache,target=<path>
. The value of <path>
is
different for each package manager. For example, stores downloaded packages in /root/.cache/composer
by default:
If a command unexpectedly fails as part of the Docker image build process, an error message will be displayed on the screen or written in the log. However, any output that was generated in previous build steps is no longer available.
You can pass --progress=plain
to the docker build
command or set the
DOCKER_BUILDKIT
environment variable to 1
prior to any docker build
commands. It’s generally a good idea to do this in CI builds so that you always
have the ability to debug failed builds.