DevOps

Leveraging the dockerignore File to Create Smaller Images

Keeping container image sizes small is one of the most common “best practice” tips out there. There is good reason for this; it’s very simple to let a container with a complex Dockerfile and a large application turn into a large container image.

A large container image can eventually become troublesome if left unchecked. When deploying a container into production, that production system must download the container from your registry. Ideally this process should be quick, however, network latency (i.e., downloading an image in London from a server in San Franscisco) can cause this process can take a long time.

If you are using services to build your containers, a large container could easily cause those services to timeout. This is also true of deployment automation such as Puppet, SaltStack, or Ansible. Each of these services has a max execution time — a large image and a slow network connection could make for a messy or failed deployment.

With that said, there are several techniques for keeping images small. In today’s article, we will explore an often-ignored technique, using the .dockerignore file.

Exploring the .dockerignore File

The .dockerignore file is a special file that can be placed within the build context directory. The build context directory is the directory that we specify at the end of a docker build command. The file itself is a simple text file that contains a list of glob patterns for files and directories to exclude from the final build image.

By leveraging the .dockerignore file, we can exclude files and directories we do not need within our final image. To explain this better, let’s walk through a real world example.

Adding .dockerignore to an existing project

One of the most common locations to store a dockerfile is the top level of the application’s code repository. My personal projects are no exception. For this article, we will go ahead and add a .dockerignore file to one of my personal open-source projects that leverages Docker.

To get started, let’s first clone the project.

$ git clone https://github.com/madflojo/automatron.git

With the project cloned, let’s take a look at the files included in this repository.

$ cd automatron/
$ ls -la
total 208
drwxr-xr-x  24 madflojo  users    816 Apr 15 23:00 .
drwxrwxrwt   7 root      users    238 Apr 15 23:00 ..
-rw-r--r--   1 madflojo  users     29 Apr 15 23:00 .coveragerc
drwxr-xr-x  13 madflojo  users    442 Apr 15 23:00 .git
-rw-r--r--   1 madflojo  users    834 Apr 15 23:00 .gitignore
-rw-r--r--   1 madflojo  users   5760 Apr 15 23:00 CONTRIBUTING.md
-rw-r--r--   1 madflojo  users    408 Apr 15 23:00 Dockerfile
-rw-r--r--   1 madflojo  users  11343 Apr 15 23:00 LICENSE
-rw-r--r--   1 madflojo  users    220 Apr 15 23:00 Procfile
-rw-r--r--   1 madflojo  users   4538 Apr 15 23:00 README.md
-rw-r--r--   1 madflojo  users   9411 Apr 15 23:00 actioning.py
drwxr-xr-x   4 madflojo  users    136 Apr 15 23:00 config
drwxr-xr-x   8 madflojo  users    272 Apr 15 23:00 core
-rw-r--r--   1 madflojo  users   7842 Apr 15 23:00 discovery.py
-rw-r--r--   1 madflojo  users    415 Apr 15 23:00 docker-compose.yml
drwxr-xr-x  10 madflojo  users    340 Apr 15 23:00 docs
-rw-r--r--   1 madflojo  users   2208 Apr 15 23:00 mkdocs.yml
-rw-r--r--   1 madflojo  users   8609 Apr 15 23:00 monitoring.py
drwxr-xr-x   9 madflojo  users    306 Apr 15 23:00 plugins
-rw-r--r--   1 madflojo  users    114 Apr 15 23:00 requirements.txt
-rw-r--r--   1 madflojo  users   5992 Apr 15 23:00 runbooks.py
drwxr-xr-x   5 madflojo  users    170 Apr 15 23:00 tests
-rw-r--r--   1 madflojo  users   1018 Apr 15 23:00 tests.py

With just a quick look, we can see several files and directories that could be omitted from a production Docker image. Files and directories such as .git/, tests/, mkdocs.yml, and even the CONTRIBUTING.md file.

Let’s see if these files are included when we perform a docker build.

$ docker build -t automatron .

The Dockerfile within this repository adds files using the following instruction.

ADD . /

This instruction essentially adds all of the files located within the build directory to the / directory within the container. We can see this if we run the container executing the ls -la command.

$ docker run automatron ls -la / | grep 2017
-rw-r--r--   1 root root    29 Apr 16  2017 .coveragerc
drwxr-xr-x   8 root root  4096 Apr 16  2017 .git
-rw-r--r--   1 root root   834 Apr 16  2017 .gitignore
-rw-r--r--   1 root root  5760 Apr 16  2017 CONTRIBUTING.md
-rw-r--r--   1 root root   408 Apr 16  2017 Dockerfile
-rw-r--r--   1 root root 11343 Apr 16  2017 LICENSE
-rw-r--r--   1 root root   220 Apr 16  2017 Procfile
-rw-r--r--   1 root root  4538 Apr 16  2017 README.md
-rw-r--r--   1 root root  9411 Apr 16  2017 actioning.py
drwxr-xr-x   3 root root  4096 Apr 16  2017 config
drwxr-xr-x   2 root root  4096 Apr 16  2017 core
-rw-r--r--   1 root root  7842 Apr 16  2017 discovery.py
-rw-r--r--   1 root root   415 Apr 16  2017 docker-compose.yml
drwxr-xr-x   6 root root  4096 Apr 16  2017 docs
-rw-r--r--   1 root root  2208 Apr 16  2017 mkdocs.yml
-rw-r--r--   1 root root  8609 Apr 16  2017 monitoring.py
-rw-r--r--   1 root root   114 Apr 16  2017 requirements.txt
-rw-r--r--   1 root root  5992 Apr 16  2017 runbooks.py
drwxr-xr-x   5 root root  4096 Apr 16  2017 tests
-rw-r--r--   1 root root  1018 Apr 16  2017 tests.py

If we look above, we can see that all of the files from the build directory have been added to the container. Let’s start excluding some of these files, starting with the .git/ directory. I am starting with the .git/ directory because it’s a commonly large directory that can easily be overlooked.

The .git/ directory is a special directory that is used by git to store all of the version control meta information. This includes details and even differences of each commit.

This means the more active a project, the larger the .git/ directory will be. For my project, the .git/ directory is only 1MB in size. However, if we look at the Apache Cassandra project’s .git/ directory it is over 200MB in size. This is due to both the size of the codebase and the active nature of the project.

For our example, the .git/ directory might not add that much value, but if we were building a container from the Cassandra project’s repository, removing the .git/ directory would greatly reduce the size of the resulting container image.

With that said, let’s go ahead and add the .git/ directory to a newly created .dockerignore file. We can do this by adding the following:

.git

Once this line is added, let’s build the container again and check the resulting contents.

$ docker build -t automatron .
$ docker run automatron ls -la / | grep 2017
-rw-r--r--   1 root root    29 Apr 16  2017 .coveragerc
-rw-r--r--   1 root root     5 Apr 16  2017 .dockerignore
-rw-r--r--   1 root root   834 Apr 16  2017 .gitignore
-rw-r--r--   1 root root  5760 Apr 16  2017 CONTRIBUTING.md
-rw-r--r--   1 root root   408 Apr 16  2017 Dockerfile
-rw-r--r--   1 root root 11343 Apr 16  2017 LICENSE
-rw-r--r--   1 root root   220 Apr 16  2017 Procfile
-rw-r--r--   1 root root  4538 Apr 16  2017 README.md
-rw-r--r--   1 root root  9411 Apr 16  2017 actioning.py
drwxr-xr-x   3 root root  4096 Apr 16  2017 config
drwxr-xr-x   2 root root  4096 Apr 16  2017 core
-rw-r--r--   1 root root  7842 Apr 16  2017 discovery.py
-rw-r--r--   1 root root   415 Apr 16  2017 docker-compose.yml
drwxr-xr-x   6 root root  4096 Apr 16  2017 docs
-rw-r--r--   1 root root  2208 Apr 16  2017 mkdocs.yml
-rw-r--r--   1 root root  8609 Apr 16  2017 monitoring.py
-rw-r--r--   1 root root   114 Apr 16  2017 requirements.txt
-rw-r--r--   1 root root  5992 Apr 16  2017 runbooks.py
drwxr-xr-x   5 root root  4096 Apr 16  2017 tests
-rw-r--r--   1 root root  1018 Apr 16  2017 tests.py

As we can see from the resulting output, the container is now missing the .git/ directory.

The above is a simple example of using the .dockerignore file. At this point, we could simply add a similar entry for each file and directory we wish to omit and we could have a smaller resulting image. There is, however, an easier way.

As I mentioned earlier, the .dockerignore file understands Unix glob patterns. If, for example, we wanted to omit all files that started with a ., we could simply add .* to the file.

It is important to note that Unix style glob patterns are not regular expressions. .* is a prime example of this. In a “glob” pattern, this matches everything that starts with a .. In a regular expression, this would match every character, essentially matching every file and directory.

Since the .dockerignore file uses Unix style glob patterns, we can safely add .* and only dot-files will be excluded.

In addition to .*, let’s go ahead and add a few more items to omit.

.*
docs
mkdocs.yml
docker-compose.yml
test*
*.md

In the above, we have some clearly specified items such as docs/, docker-compose.yml, and mkdocs.yml. We also have some glob patterns such as test*, which will cause us to omit tests/ and tests.py. We also have another interesting one: *.md, which will cause Docker to omit any markdown file such as README.md and CONTRIBUTING.md.

Let’s see how this comes together by running another build and ls -la.

$ docker build -t automatron .
$ docker run automatron ls -la / | grep 2017
-rw-r--r--   1 root root   408 Apr 16  2017 Dockerfile
-rw-r--r--   1 root root 11343 Apr 16  2017 LICENSE
-rw-r--r--   1 root root   220 Apr 16  2017 Procfile
-rw-r--r--   1 root root  9411 Apr 16  2017 actioning.py
drwxr-xr-x   3 root root  4096 Apr 16  2017 config
drwxr-xr-x   2 root root  4096 Apr 16  2017 core
-rw-r--r--   1 root root  7842 Apr 16  2017 discovery.py
-rw-r--r--   1 root root  8609 Apr 16  2017 monitoring.py
-rw-r--r--   1 root root   114 Apr 16  2017 requirements.txt
-rw-r--r--   1 root root  5992 Apr 16  2017 runbooks.py

The output this time is quite a bit less than our previous run. We can see that we are now missing the files we wanted to omit.

At this point, we have achieved our goal: we eliminated files that were not needed within our final image. There is, however, one file missing that I wanted to include.

!Sign up for a free Codeship Account

Using ! to include files

The missing file is the README.md file. In the .dockerignore file, we added a line *.md to omit all markdown files. My project has a few markdown files already and I fully expect more to pop up in the future.

The problem is I’d like to include only the README.md and no other markdown files. I’d also like to not have to specify each and every markdown file to accomplish this. Luckily, Docker provides this ability.

By adding the following, we can keep our removal of all markdown files but still retain our README.md:

.*
docs
mkdocs.yml
docker-compose.yml
test*
*.md
!README.md

With the above, we simply added the README.md file with the ! character in front of it. This tells Docker to include the README.md or rather exclude it from other exclusions.

Let’s go ahead and see what files our container includes with our changes applied.

$ docker build -t automatron .
$ docker run automatron ls -la / | grep 2017
-rw-r--r--   1 root root   408 Apr 16  2017 Dockerfile
-rw-r--r--   1 root root 11343 Apr 16  2017 LICENSE
-rw-r--r--   1 root root   220 Apr 16  2017 Procfile
-rw-r--r--   1 root root  4538 Apr 16  2017 README.md
-rw-r--r--   1 root root  9411 Apr 16  2017 actioning.py
drwxr-xr-x   3 root root  4096 Apr 16  2017 config
drwxr-xr-x   2 root root  4096 Apr 16  2017 core
-rw-r--r--   1 root root  7842 Apr 16  2017 discovery.py
-rw-r--r--   1 root root  8609 Apr 16  2017 monitoring.py
-rw-r--r--   1 root root   114 Apr 16  2017 requirements.txt
-rw-r--r--   1 root root  5992 Apr 16  2017 runbooks.py

With the !README.md entry added, we can now see our README.md was included but not our CONTRIBUTING.md. This means our instruction to omit all markdown files (*.md) was applied to all except the README.md.

Summary

In this article, we covered how to leverage the .dockerignore file to exclude unnecessary files and directories from the container build. As we found out, the usage of the .dockerignore file is very simple. Do you have any .dockerignore tips or tricks? Add it to the comments or tweet it to us.

Benjamin Cane

Benjamin Cane is a systems architect in the financial services industry. He writes about Linux systems administration on his blog and has recently published his first book, Red Hat Enterprise Linux Troubleshooting Guide.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button