Dockerfile - Best Practices // Docker 101

Image alt

One of the great benefits of using Docker as a containerization engine is that it’s thin and lightweight, leading to less overhead and overall better performance for your applications that traditional non-containerized ones (where dependencies from one part of the application could lead to bottlenecks in another part of the application, hosted on the same machine).

Another advantage that comes from the usage of Dockerfiles that are part of Docker is the fact that you have repeatable “recipes” that you can use to deploy an identical replica of the application everywhere (local environment, staging, production).

However, a badly written Dockerfile could actually result in more frustration than benefit for everyday development and – more importantly – application performance.

Here are some Dockerfile best practices and tips on how to avoid that.

Incorrect Dockerfile ordering leads to cache bust

FROM debian
COPY . /app
RUN apt-get update
RUN apt-get -y install ssh vim

When starting a new project, the above Dockerfile might look like a good idea. You have a repeatable recipe you can run again and again. One that copies your application source code to the /app folder within the container and installs SSH CLI and Vim to be used later. After the first build of the image, subsequent builds will use cache and you will be fine.

However, if you change your application source code and rebuild the container, you will notice the second step busts the cache of all subsequent steps, meaning that it will install SSH and Vim every single time you change a line of code. Not great. Here’s a better approach:

FROM debian
RUN apt-get update
RUN apt-get -y install ssh vim git
COPY . /app

No more cache-busting. Line 2 and 3 are cached and reused even while changing source code. Rule of thumb: put instructions that change rarely towards the top.

Be specific about what you copy, to avoid cache busts

Let’s check out this example:

FROM php:7.1-apache

COPY . /app

ENV APACHE_DOCUMENT_ROOT=/app/dist
RUN sed -ri -e 's!/var/www/html!${APACHE_DOCUMENT_ROOT}!g' /etc/apache2/sites-available/*.conf
RUN sed -ri -e 's!/var/www/!${APACHE_DOCUMENT_ROOT}!g' /etc/apache2/apache2.conf /etc/apache2/conf-available/*.conf

The above will start an Apache container with PHP that serves static files from the /dist directory of our source code ( e.g. we have an Angular application that, when built – generates static HTML, JS, CSS in the /dist folder).

This is okay but again it would mean that every single change to a configuration file or any metadata file would lead to cache busting and slower Docker build times.

A better option would be to copy just the “artifacts” instead of the source code:

# Copy only the ./dist folder
COPY ./dist /app/dist

Combine “apt-get update” with “apt-get install” to avoid outdated packages

FROM debian
RUN apt-get update
RUN apt-get -y install ssh vim jdk

The above may seem like a valid Dockerfile at first, but what the build process does under the hood is – it will create separate layers for the two RUN instructions and cache them independently. Meaning that even if you add another package to the apt-get install step, the apt-get update will never be run (unless you manually invalidate cache).

A simple solution is to combine the update and install steps, leading to one cacheable unit that will be invalidated atomically:

FROM debian
RUN apt-get update && \
    apt-get -y install ssh vim jdk

Conclusion

Overall, a good Dockerfile could make a difference in your everyday development or application release experience.

Do you have any other tips that you learned from experience while working with Dockerfiles?