We use cookies to understand how people use Depot.
⚡ Introducing Depot Cache
← All Posts

Building Images: Gzip vs Zstd

Written by
kyletryon
Kyle Tryon
Published on
22 October 2024
Docker images are downloaded as compressed layers and default to gzip compression. But is there a better option? We compare gzip and zstd compression methods to see which is best for building and pushing images.
Building Images: Gzip vs Zstd banner

When you pull (or push) a Docker image from a registry, what really happens behind the scenes? When you request an image, Docker first retrieves a manifest from the registry. This manifest contains crucial information about the image, such as its architecture, configuration, and most importantly, a list of layers (individual file system changes) that need to be downloaded.

By default, Docker checks the manifest for a build that matches your system’s architecture (e.g., linux/amd64). If it finds a match, Docker retrieves a manifest specific to that architecture, which includes the layers that make up the image.

docker manifest inspect hello-world@sha256:e2fc4e5012d16e7fe466f5291c476431beaa1f9b90a5c2125b493ed28e2aba57
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "digest": "sha256:d2c94e258dcb3c5ac2798d32e1249e42ef01cba4841c2234249495f87264ac5a",
    "size": 581
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:c1ec31eb59444d78df06a974d155e597c894ab4cda84f08294145e845394988e",
      "size": 2459
    }
  ],
  "annotations": {
    "org.opencontainers.image.revision": "3fb6ebca4163bf5b9cc496ac3e8f11cb1e754aee",
    "org.opencontainers.image.source": "https://github.com/docker-library/hello-world.git#3fb6ebca4163bf5b9cc496ac3e8f11cb1e754aee:amd64/hello-world",
    "org.opencontainers.image.url": "https://hub.docker.com/_/hello-world",
    "org.opencontainers.image.version": "linux"
  }
}

You can see under layers.MediaType, the MIME type of the layer ends in tar+gzip. As you pull the image, Docker downloads these compressed layers and then extracts them into a filesystem on your machine. This is why you see messages like downloading , followed by extracting during the pull process:

Pulling from library/rust
c1e0ef7b956a: Pull complete
143d0f027108: Extracting [========================>      ]  31.75MB/64.35MB

When you push an image, Docker will automatically compress the layers with gzip before sending them to the registry. However, the buildx build or depot build commands give us much more control over our build process. We can specify the compression method we want to use while building and pushing using the --output flag.

depot build . --output type=image,name=<registry>/<namespace>/<repository>:<tag>,compression=<compression method>,oci-mediatypes=true,platform=linux/amd64

You can see in the BuildKit README the compression flag can be set to uncompressed, gzip (default), zstd, or estargz. We'll do a deeper drive on estargz in an upcoming post, as that is a bit of a special case. But what about the other options?

What is Gzip?

Gzip is the default compression method used by BuildKit and Docker (for instance, when you run the docker push command). Gzip is a wrapper around the DEFLATE algorithm, and has been the standard for general compression since the 90's.

Gzip (unlike zip) concatenates files together as a tar before compressing them, allowing for better compression than individually compressing files. The final compressed collection of files is called a tarball and has the extension .tar.gz.

# The `-z` tells `tar` to use `gzip` to compress the files.
tar -czvf myfiles.tar.gz myfiles

The -z tells tar to use gzip to compress the files.

One common complaint of gzip is that it is still a single-threaded application. For one reason or another, probably for compatibility due to how ubiquitous and long-standing gzip is, gzip has not been updated to take advantage of modern multi-core processors. There is however, a parallel implementation of gzip called pigz that can take advantage of multiple cores when compressing or decompressing files.

We've talked before about how compression isn't free, but it is usually made up for in the time saved by transferring smaller files over the network. Depending on your specific use-case and data, you may want to consider using a different compression method, or maybe even no compression at all.

If you were dealing with a machine that does not have a lot of spare CPU cycles, but does have a strong network connection, it would potentially be faster to pull uncompressed layers from your registry. However, there might be a best-of-both-worlds scenario with a newer compression algorithm like zstd.

What is Zstd?

Zstandard (zstd) was developed by Facebook in 2015 and open-sourced a year later. Since then it has been rapidly gaining support due to a number of modern features, but most significantly its strong decompression performance. Zstandard was designed with the intent of providing similar compression ratios to gzip, but with much faster decompression speeds.

One of the largest differences out of the box, is zstd is a multi-threaded tool, while the time-tested gzip is still a single-threaded application. At it's core, Zstandard shares some similarities with gzip. Just like gzip is a wrapper around DEFLATE, Zstandard uses a combination of LZ77 dictionary-matching (used in DEFLATE) with a different entropy encoding scheme.

Benchmark Gzip vs Zstd

This is not a comprehensive real-world benchmark, but this should give us a rough simulation of how the different compression methods will compare in the context of building and pushing a container image.

In this test, we will build the PostHog/posthog Docker image as an uncompressed tarball and then compress and uncompress it using both gzip and zstd. In actuality, each layer would be compressed individually and there would be a little additional overhead in making multiple verification checks, but we will be running these all on the same machine (depot-ubuntu-24.04-16) with 16 CPU cores, so the relative performance should be similar.

Comparison of Compression Ratios

By default, zstd is designed to perform similarly in compression ratio to gzip, though we can see it still has a slight edge.

Both gzip and zstd have tunable compression levels, with zstd having a much larger range of options. In this test, we used the default compression levels for both gzip and zstd. Additionally, I've included zstd level 9 as it is a common choice for high compression without sacrificing too much speed.

Comparison of Compression Times

Clearly, being limited to a single thread is extremely detrimental to our build times. Interestingly, pigz (the parallel implementation of gzip) will be used by containerd for decompression if it is installed on the host machine, but not for compression.

Single-stream gzip considers the whole file as a single chunk, meaning excellent compression ratios but limited by the speed of a single thread. Gzip has an alternate mode that supports chunked compression, where the file is subdivided and compressed separately, and then finally concatenated together to form a tarball. This is ultimately the property that pigz leans on, compressing each chunk in parallel in a separate thread, completing work an order of magnitude faster.

This sounds like a win all around. However, even with pigz present on the machine, Docker will still by default compress image layers using single-stream gzip for ecosystem compatibility. Although both gzip and pigz produce correct layers, they arrive at their tarballs slightly differently and thus produce layers of different sizes and hashes. When those layers are ultimately pushed to a registry, they'll be recognized as different images despite having identical uncompressed content. Docker as a platform is making a tradeoff here, preferring canonical images in Docker Hub at the expense of slower compression speeds.

Depot, on the other hand, aims to be the fastest place to build software, and so we've placed a higher premium on compression speed. We've modified our builders to use pigz for faster compression out of the box. So if you are currently building with the depot build command, you are already taking advantage of multi-threaded compression.

Zstandard is natively supported by containerd and can be used for both compression and decompression. Standard zstd compression is within a margin of error for compression time compared to pigz, and will theoretically be faster for decompression.

In this test, using zstd at level 9 significantly increased the compression time while only providing a modest improvement in the compression ratio. It's a good idea to run similar benchmark tests with your own container builds to see what will ultimately be the best choice for you over the long term.

Comparison of Decompression Times

Often the most pertinent metric is our decompression time. If we were thinking of a Kubernetes deployment that needed to scale up quickly, we need to ensure that our images can be decompressed quickly.

Zstandard is significantly faster in our test than pigz, and interestingly, decompressing zstd-9 was roughly the same (actually slightly faster in this run) as the default zstd compression level. So with a little more compute up front, some bandwidth can be saved in the pull process without affecting the decompression time.

Luckily as we said before, pigz will be used by containerd for decompression if it is installed on the host machine, so we can still take advantage of parallel decompression in most situations. If you currently are using gzipped images, which is all images by default, you should definitely consider installing pigz on your host machine.

Conclusion

Determining the best option often comes down to various factors, such as data type, core count, and network speed. But in general, adding zstd to your builds is an easy "one-liner" improvement that can save time, bandwidth, and potentially money.

Where zstd truly shines is in its decompression speed, which in this test was nearly 60% faster than pigz while producing a smaller file than gzip in the compression stage. That is an easy way to get your pods starting roughly twice as fast or save on your CI/CD bill.

If you are building images with depot build, you are already benefiting from multi-threaded compression in your build process via pigz. Whether you are building images with depot build or docker buildx build, you can still benefit from faster decompression times by using zstd for your images by setting the --output flag to compression=zstd.

- uses: depot/build-push-action@v1
  with:
    context: .
    tags: '<org>/<image>:<tag>'
    outputs: 'compression=zstd,oci-mediatypes=true'
Your builds have never been this quick.
Start building