This is an updated post of a previous post from 2022 about how you can use dive
to inspect the contents of an image.
In Docker layer caching for GitHub Actions, we covered using the existing layer cache is fundamental to speeding up Docker image builds. The less work we have to redo across builds, the faster our builds will be.
But, leveraging the cache is only one part of the equation for making docker build
as fast as possible.
Another part of the equation is reducing the overall image size to improve build time. This post will look at reducing the overall image size to improve build time and the other benefits of keeping images small. We will use a popular open-source project, dive
, to help analyze a Docker image, stepping through each individual layer to see what files it adds to the image and how it impacts the total image size.
Our example Docker image
We will use an example Node project with an ordinary Dockerfile someone may write when getting started. It has the following directory structure:
There is a src
folder, a node_modules
folder, a package.json
file, a Dockerfile
file, and a dist
folder that contains the build output of yarn build
. Here is an unoptimized Dockerfile
for this project that we may write.
This isn't an uncommon Dockerfile
that we typically see in the wild. But if we build the image and then check its final size using the following commands:
We see that the image size is 1.5 GB. That seems quite large for this example Node application and our Dockerfile
above.
Why is my Docker image so large?
It's a common question once you start seeing eye opening image sizes for seemingly innocent Dockerfiles. Luckily, there are tools to help answer this question.
The open-source project dive
is an excellent tool for analyzing a Docker image. It allows us to view each layer of an image, including the layer size and what files are inside.
We can use dive
on the example-image
we just built:
As seen above, the terminal UI of dive
shows us the layers that make up the image on the left-hand side.
The right side, as seen here, shows us the filesystem of the selected layer. It shows what files were added, removed, or modified between the layer selected and the parent before it.
Our first image above shows that the first nine layers are all related to the base image, FROM node:16
, for a summed size of ~910 GB. That's large but not surprising, considering we use the node:16
image as our base.
The next interesting layer is the eleventh one, where we COPY . .
, it has a total size of 145 MB. Considering the project we are building, it is much larger than expected. Using the filesystem pane of dive
, we can see the files added to that layer via that command.
Now things get a little more compelling. Analyzing the layer, we can see it contains our entire project directory, including directories like dist
and node_modules
that we recreate with future RUN
steps. So now that we have spotted the first problem with our image size, we can start implementing solutions to slim it down.
Reducing image size
Now that we have insights into what is in our image via dive
, we can reduce the final Docker image size using three different techniques.
- Add a
.dockerignore
file to our project to exclude unnecessary files or directories - Change our
Dockerfile
to use smaller base images. - Use multi-stage builds to exclude unnecessary artifacts from earlier stages in the final image
.dockerignore
file
Add a A .dockerignore
file instructs Docker to skip files or directories during docker build
. Files or directories that match in .dockerignore
won't be copied with any ADD
or COPY
statements. As a result, they never appear in the final built image.
The .dockerignore
syntax is similar to a .gitignore
file. We can add a .dockerignore
file to the root of our project that ignores all the unneeded files for our example image build.
Here we exclude files that are recreated as part of our Dockerfile
, like node_modules
are installed via the RUN yarn install --immutable
step. We also exclude unnecessary folders like .git
and dist
, the RUN yarn build
output.
With this small change, we can rebuild our image and recheck its size.
The size is now 1.3 GB instead of 1.5 GB, so we have already shaved off 200 MB from our image size!
Looking at the COPY
layer via dive
again, we see that we removed the node_modules
folder and other file paths from our .dockerignore
. Bringing the layer size down from 145 MB to less than 400 KB.
Shave lots of bytes with smaller base images
Slim base images can provide dramatic reductions in image size. But they do come with tradeoffs that are worth considering. For example, as we will see, the alpine
base image provides a massive size reduction, but it comes with its own and more limited package manager, apk
. However, for most use cases, this limitation is manageable and can often be worked around.
For our example, we don't mind the tradeoffs presented for the node:16-alpine
base image, so we can plug it into our Dockerfile
and run a new build.
Changing the base image to alpine
brings the number of base layers down from nine to five. Reducing the total image size of 1.3 GB down to 557 MB, nearly 3x smaller than the original 1.5 GB image.
Leverage multi-stage builds
A multi-stage build allows you to specify multiple FROM
statements in a single Dockerfile, and each of them represents a new stage in a build. You can also copy files from one stage to another. Files not copied from an earlier stage are discarded in the final image, resulting in a smaller size.
Here is what our example Dockerfile looks like with an optimized multi-stage build.
The first stage copies in the package.json
, yarn.lock
, and tsconfig.json
files so that node_modules
can be installed and the application can be built.
The second stage copies the node_modules
and dist
folders from the first stage, build
, into the final image. The items not copied from the first stage get discarded. We no longer have a COPY . .
step either; instead, we only copy in the node_modules
and the build output of our project, the dist
folder.
If we build this example with a multi-stage build, we can bring the total image size down to 315 MB. That's a 4x reduction in image size from the original 1.5 GB.
The benefits of reducing Docker image size
Smaller images build and deploy faster. But speed is one of many benefits of keeping your container images small. The smaller the image is, the less complex it is as well. The less complex an image is the fewer binaries and packages inside it and, by extension, the fewer pathways for vulnerabilities to exist.
Using the three techniques we covered in this post and dive
to analyze the contents of our images, we can drastically reduce the size of Docker images so that they build and run faster. But we also make them less complex, more accessible to reason about, and more secure.