When you pull (or push) a Docker image from a registry, what really happens behind the scenes? When you request an image, Docker first retrieves a manifest from the registry. This manifest contains crucial information about the image, such as its architecture, configuration, and most importantly, a list of layers (individual file system changes) that need to be downloaded.
By default, Docker checks the manifest for a build that matches your system’s architecture (e.g., linux/amd64
). If it finds a match, Docker retrieves a manifest specific to that architecture, which includes the layers that make up the image.
You can see under layers.MediaType
, the MIME type of the layer ends in tar+gzip
. As you pull the image, Docker downloads these compressed layers and then extracts them into a filesystem on your machine. This is why you see messages like downloading , followed by extracting during the pull process:
When you push an image, Docker will automatically compress the layers with gzip
before sending them to the registry. However, the buildx build
or depot build
commands give us much more control over our build process. We can specify the compression method we want to use while building and pushing using the --output
flag.
You can see in the BuildKit README the compression
flag can be set to uncompressed
, gzip
(default), zstd
, or estargz
. We'll do a deeper drive on estargz
in an upcoming post, as that is a bit of a special case. But what about the other options?
What is Gzip?
Gzip is the default compression method used by BuildKit and Docker (for instance, when you run the docker push
command). Gzip is a wrapper around the DEFLATE
algorithm, and has been the standard for general compression since the 90's.
Gzip (unlike zip
) concatenates files together as a tar
before compressing them, allowing for better compression than individually compressing files. The final compressed collection of files is called a tarball
and has the extension .tar.gz
.
The -z
tells tar
to use gzip
to compress the files.
One common complaint of gzip
is that it is still a single-threaded application. For one reason or another, probably for compatibility due to how ubiquitous and long-standing gzip
is, gzip
has not been updated to take advantage of modern multi-core processors. There is however, a parallel implementation of gzip
called pigz
that can take advantage of multiple cores when compressing or decompressing files.
We've talked before about how compression isn't free, but it is usually made up for in the time saved by transferring smaller files over the network. Depending on your specific use-case and data, you may want to consider using a different compression method, or maybe even no compression at all.
If you were dealing with a machine that does not have a lot of spare CPU cycles, but does have a strong network connection, it would potentially be faster to pull uncompressed layers from your registry. However, there might be a best-of-both-worlds scenario with a newer compression algorithm like zstd
.
What is Zstd?
Zstandard (zstd) was developed by Facebook in 2015 and open-sourced a year later. Since then it has been rapidly gaining support due to a number of modern features, but most significantly its strong decompression performance. Zstandard was designed with the intent of providing similar compression ratios to gzip
, but with much faster decompression speeds.
One of the largest differences out of the box, is zstd
is a multi-threaded tool, while the time-tested gzip
is still a single-threaded application. At it's core, Zstandard shares some similarities with gzip
. Just like gzip
is a wrapper around DEFLATE
, Zstandard uses a combination of LZ77 dictionary-matching (used in DEFLATE
) with a different entropy encoding scheme.
Benchmark Gzip vs Zstd
This is not a comprehensive real-world benchmark, but this should give us a rough simulation of how the different compression methods will compare in the context of building and pushing a container image.
In this test, we will build the PostHog/posthog Docker image as an uncompressed tarball and then compress and uncompress it using both gzip
and zstd
. In actuality, each layer would be compressed individually and there would be a little additional overhead in making multiple verification checks, but we will be running these all on the same machine (depot-ubuntu-24.04-16
) with 16 CPU cores, so the relative performance should be similar.
Comparison of Compression Ratios
By default, zstd
is designed to perform similarly in compression ratio to gzip
, though we can see it still has a slight edge.
Both gzip
and zstd
have tunable compression levels, with zstd
having a much larger range of options. In this test, we used the default compression levels for both gzip
and zstd
. Additionally, I've included zstd
level 9 as it is a common choice for high compression without sacrificing too much speed.
Comparison of Compression Times
Clearly, being limited to a single thread is extremely detrimental to our build times. Interestingly, pigz
(the parallel implementation of gzip
) will be used by containerd
for decompression if it is installed on the host machine, but not for compression.
Single-stream gzip
considers the whole file as a single chunk, meaning excellent compression ratios but limited by the speed of a single thread. Gzip has an alternate mode that supports chunked compression, where the file is subdivided and compressed separately, and then finally concatenated together to form a tarball. This is ultimately the property that pigz
leans on, compressing each chunk in parallel in a separate thread, completing work an order of magnitude faster.
This sounds like a win all around. However, even with pigz
present on the machine, Docker will still by default compress image layers using single-stream gzip
for ecosystem compatibility. Although both gzip
and pigz
produce correct layers, they arrive at their tarballs slightly differently and thus produce layers of different sizes and hashes. When those layers are ultimately pushed to a registry, they'll be recognized as different images despite having identical uncompressed content. Docker as a platform is making a tradeoff here, preferring canonical images in Docker Hub at the expense of slower compression speeds.
Depot, on the other hand, aims to be the fastest place to build software, and so we've placed a higher premium on compression speed. We've modified our builders to use pigz
for faster compression out of the box. So if you are currently building with the depot build
command, you are already taking advantage of multi-threaded compression.
Zstandard is natively supported by containerd
and can be used for both compression and decompression. Standard zstd
compression is within a margin of error for compression time compared to pigz
, and will theoretically be faster for decompression.
In this test, using zstd
at level 9 significantly increased the compression time while only providing a modest improvement in the compression ratio. It's a good idea to run similar benchmark tests with your own container builds to see what will ultimately be the best choice for you over the long term.
Comparison of Decompression Times
Often the most pertinent metric is our decompression time. If we were thinking of a Kubernetes deployment that needed to scale up quickly, we need to ensure that our images can be decompressed quickly.
Zstandard is significantly faster in our test than pigz
, and interestingly, decompressing zstd-9
was roughly the same (actually slightly faster in this run) as the default zstd
compression level. So with a little more compute up front, some bandwidth can be saved in the pull process without affecting the decompression time.
Luckily as we said before, pigz
will be used by containerd
for decompression if it is installed on the host machine, so we can still take advantage of parallel decompression in most situations. If you currently are using gzipped images, which is all images by default, you should definitely consider installing pigz
on your host machine.
Conclusion
Determining the best option often comes down to various factors, such as data type, core count, and network speed. But in general, adding zstd
to your builds is an easy "one-liner" improvement that can save time, bandwidth, and potentially money.
Where zstd
truly shines is in its decompression speed, which in this test was nearly 60% faster than pigz
while producing a smaller file than gzip
in the compression stage. That is an easy way to get your pods starting roughly twice as fast or save on your CI/CD bill.
If you are building images with depot build
, you are already benefiting from multi-threaded compression in your build process via pigz
. Whether you are building images with depot build
or docker buildx build
, you can still benefit from faster decompression times by using zstd
for your images by setting the --output
flag to compression=zstd
.