Lets learn "How to write good Dockerfiles"

 


In this tutorial, we'll try to learn the best practices and methods to build efficient docker images using Dockerfile. Writing a good dockerfile can reduce the image build time significantly. It can also save your money by reducing the storage cost because an efficient docker image can reserve very little disk space compared to an inefficient docker image. Hence it is very important to take a few things into account while writing a dockerfile.

Before going further let's learn what exactly Dockerfile is in case you don't know. Dockerfile is a text file containing a set of instruction which is used by docker daemon to build the docker image. For docker to build a docker image from Dockerfile, the file needs to adhere to a specific format and set of instructions.

Docker image/container layers


Basically, all the docker images consist of layers. Each instruction in Dockerfile forms a layer. The more the number of layers more is the size of the image. Also as mentioned in the above diagram all the layers of an image are read-only.  One cannot modify the content of any layer of the built docker image. And containers are nothing but an extra layer on top of image layers. And the container's layer is read-write. So if someone is creating multiple docker containers from the same image, all the containers share the same image layers and the only different layer is the container layer so whatever changes made in the container take effect in the container layer.

Now as you are familiar with the layers concept it is the time to go through the rules to write dockerfile to build the efficient images.

Keep the minimal number of layers by making each instruction consolidated

As we discussed before more the number of layers, the more is the size of the image. So we should try to have a minimal possible number of instructions in Dockerfile by making every instruction consolidated. For e.g

RUN echo 'foo bar' >> /etc/foobar.config
RUN cp /etc/foobar.config /etc/foobaz.config

can be written as

RUN echo 'foo bar' >> /etc/foobar.config && cp /etc/foobar.config /etc/foobaz.config

Avoid unnecessary packages

We should try to avoid installing unnecessary packages which can increase the size of the built image.

Use a minimal base docker image

Selecting the base image is a very important step. This is similar to the previous rule `Avoid unnecessary packages.` as bigger base images can have unnecessary packages. Multiple docker images can be used to achieve the same behavior.  But the rule says out of all such images, use that image as the base image which has a minimum size. So, we should prefer alpine images which are lesser in size compared to other similar images.

Make use of docker caching

Docker caching helps a lot to reduce the image build time. The way docker caching works is it iterates through each instruction in Dockerfile and checks if the result of that instruction can be re-used from the last built image. If it is re-usable then docker will re-use that but if any instruction found whose result can't be used from the cache but instruction needs to be executed instead, from that point docker daemon won't even look at the cache for the rest of the instructions in the cache.

From top to bottom in the Dockerfile.

INSTRUCTION 1 <- cache hit, result re-used
INSTRUCTION 2 <- cache hit, result re-used
INSTRUCTION 3 <- instruction changed, cache miss
INSTRUCTION 4 <- don't check the cache
INSTRUCTION 5 <- don't check the cache
...
INSTRUCTION N <- don't check the cache

As we discussed before every instruction in dockerfile forms a layer that is persisted on the system when the image is built. If we try to rebuild the image if the instruction's result can be reused docker will use the same persisted layer. 

Let's go through an example

Dockerfile 1:

  1. FROM <base-image>:<version>
  2. LABEL maintainer="letslearn@gmail.com"
  3. RUN yum install -y python3 make git ( + other required packages )
  4. COPY ./ /src. <-cache disabled
  5. RUN make -C ./ build-app
  6. ...
  7. CMD [ "sh" ] 

Dockerfile 2:

  1. FROM <base-image>:<version>
  2. LABEL maintainer="letslearn@gmail.com"
  3. COPY ./ /src.   <- cache disabled
  4. RUN yum install -y python3 make git ( + other required packages ) && make -C ./ build-app
  5. ...
  6. CMD [ "sh" ] 
From the above two Dockerfiles which one is better? One could say that the second Dockerfile will be better as it has less no. of instruction and hence its size would be less. But here is a trade-off between the size of the docker image and the build time of the docker image. 

In 1st Dockerfile when the image is built multiple times the first 3 steps will always be cached and starting from the 4th instruction docker won't check the cache because /src is the directory which keeps changing so there will be a cache miss. The advantage here is every time we build an image time to install so many packages will be saved ( i.e step 3 ). Agree on not but step 3 is the very expensive step.

In the 2nd Dockerfile cache will be disabled starting Step3 so 4th step will be executed again which will increase the build time a lot.

So there are some cases where there is a trade-off between image build time vs image size. In such cases, the maintainer has to decide which way to go. At least in the above example, we'll go with the first Dockerfile. In general, people try to save build time compared to image size, but yeah exceptions are everywhere.

NOTE: If someone wanna build the docker image without cache docker build command can be run with --no-cache option.

Keep a .dockerignore file

When the docker builds the docker image using the dockerfile, the first step is to build the context. To build the context docker (by default) include all the files present in the directory and subdirectories from where the docker build command is run. This might include some unnecessary. The recommended way is to put the Dockerfile an empty directory and create a .dockerignore file to include the files that we don't want Docker to build the context. Small build context can save build time, hence improve build performance.

Sort multi-line arguments

To avoid multiple similar packages to be installed as part of image build keep the packages to install in sorted order.
For e.g
RUN yum update -y && \
    yum install -y apache2 \
                   git \
                   java \
                   python


That's it! taking all above rules in the account you can write a good Dockerfile.

HOPE YOU LIKE THIS TUTORIAL. STAY TUNED FOR MORE TUTORIALS :)

Comments

Popular posts from this blog

Lets learn "About kube proxy in iptables mode"

Lets learn "System design for paste bin (or any text sharing website)"

Lets learn "Factory design pattern"