As we explored in Docker layer caching for GitHub Actions, making use of the existing layer cache as frequently as possible is key to speeding up Docker image builds. The less work we have to redo across builds, the faster our builds will be.
In this post, we will look at reducing the overall image size, to improve build time and reduce the amount of data transferred in each image. We will use a popular open-source project,
dive, to help analyze a Docker image, stepping through each individual layer to see what files it adds to the image and how it impacts the total image size.
diveinto a Docker image
The open-source project
dive is a great tool for analyzing a Docker image. It allows you to view each individual layer of an image, including the layer size and what files are contained inside.
For this post, we will use an example Node project with a common Dockerfile someone may write when just getting things started. It has the following directory structure:
. ├── Dockerfile ├── README.md ├── dist │ ├── somefile1.d.ts.map │ ├── somefile1.js ├── node_modules ├── package.json ├── src │ ├── index.ts ├── tsconfig.json ├── yarn-error.log └── yarn.lock
There is a
src folder, a
node_modules folder, a
package.json file, a
Dockerfile file, and a
dist folder that contains the build output of
yarn build. Here is an unoptimized
Dockerfile for this project that has the potential to produce bigger images sizes than we would like.
FROM node:16 WORKDIR /app COPY . . RUN yarn install --immutable RUN yarn build CMD ["node", "./dist/index.js"]
This looks innocent enough. But if we build the image using
docker build -t example-image . we see that the image size is 1.5 GB. That seems large for something as trivial as a simple Node application represented in the
Dockerfile. Let's use
dive to analyze the layers of the image to try to determine why it's so large.
On the left we can see each layer that makes up the image, and on the right, the filesystem of the selected layer. The filesystem pane shows what files were added, removed, or modified between the layer selected and the parent before it.
Looking at our image from above, we see that the first nine layers are all related to the base image,
FROM node:16, for a summed size of ~910 GB. That's large, but also not surprising considering we are using the
node:16 image as our base.
The next interesting layer is the eleventh one, where we
COPY . ., it has a total size of 145 MB. This is a lot larger than expected considering the project we are building. Using the filesystem pane of
dive we can see what files were added to that layer via that command.
Now things get a little more compelling. By analyzing the layer, we can see it contains our entire project directory, including directories like
node_modules that we expected to recreate with future
RUN steps. Now that we have spotted the first problem with our image size, we can start implementing solutions to slim it down.
Now that we have a good grasp on how we can leverage
dive to analyze the contents of a Docker image, we can start implementing solutions to reduce the size of our image.
.dockerignore is a file that instructs Docker to skip certain files or directories during
docker build. If a file or directory matches one of the patterns in
.dockerignore, it won't be copied with any
COPY statements, and thus won't appear in the final built image. The
.dockerignore file is similar to a
.gitignore file, and uses a similar syntax.
We can add a
.dockerignore file that ignores all the files that aren't needed for our image build.
node_modules Dockerfile* .git .github .gitignore dist/** README.md
With this small change, we can build our image and see that the size is now 1.3 GB instead of 1.5 GB. Looking at the
COPY layer via
dive again, we see that we have removed the
node_modules folder and files that weren't needed for the final image. Bringing that layer size down from 145 MB to less than 400 KB.
It may initially seem like excluding directories like
node_modules is a mistake, after all our application does use dependencies. However the key here is that the
RUN yarn install --immutable line in our Dockerfile is responsible for installing those dependencies, what we are doing with the
.dockerignore file is excluding the local
node_modules directory from being unnecessarily copied with
COPY . ., resulting in two copies of that directory.
To get dramatic reductions in image size, it's often worth considering a slim base image. The
alpine images are very popular as base images because they tend to be under 5 MB in size. They do come with tradeoffs though, so when considering using one for your language or framework it's worth understanding the limitations of using them.
For our example, we don't mind the tradeoffs presented for the
node:16-alpine base image, so we can plug it into our
Dockerfile and run a new build.
- FROM node:16 + FROM node:16-alpine ...
Changing the base image brings the base layers down from nine to five. Reducing the total image size of 1.3 GB down to 557 MB. A significant improvement over the original image size of 1.5 GB.
A multi-stage build allows you to specify multiple
FROM statements in a single Dockerfile, and each of them represents a new stage in a build. You can also copy files from one stage to another. Files not copied from an earlier stage are discarded in the final image, resulting in a smaller image size.
Here is what our example Dockerfile looks like with a highly optimized multi-stage build.
FROM node:16-alpine AS build WORKDIR /app COPY package.json yarn.lock tsconfig.json ./ RUN yarn install --immutable COPY src/ ./src/ RUN yarn build FROM node:16-alpine WORKDIR /app COPY --from=build /app/node_modules /app/node_modules COPY --from=build /app/dist /app/dist CMD ["node", "./dist/index.js"]
The first stage copies in the
tsconfig.json files so that
node_modules can be installed the application can be built. The second stage copies the
dist folder into the final image, discarding the rest of the first stage. Notice that we don't copy in the entire directory of our project, we only copy in the
node_modules and build output of our project, the
With these changes, the total image size is now down to a total of 315 MB from the original 1.5 GB. By adding a
.dockerignore file, using a smaller base image, and leveraging multi-stage builds, we made the image size nearly five times smaller.
Smaller images build and deploy faster.
By being careful to reduce our build context size with
.dockerignore, we can reduce the time it takes to transfer that context into the build and keep unnecessary files out of the final image. And by using a smaller base image and multi-stage builds, we can reduce the overall image size, reducing the amount of time and network bandwidth it takes to pull the image and launch it.
Luckily, tools like
dive make it straightforward to understand what each layer in your image contains, so that you can diagnose issue and see the effects of your optimizations.