Docker introduced Multi Staged Builds in v17.05. It answered one of the most challenging tasks when building a docker image: keeping the size down. The Multi Staging feature in this new version helps you build optimized and small docker images.
Back in the day, you often ended up with two docker files: dev and production. The development file was bloated with artifacts that you did not need, while the production one was optimized and usually didn’t include any artifacts. Therefore you ended up with two Dockerfile, increasing the maintenance cost of keeping one Docker for each environment.
Some experienced users would make incredible contraptions using scripts to use the same Dockerfile in development and production while keeping the size down. The Multi staging build allows you to optimize your file and be productive in development and production.
To understand how Multi Staging builds works, I'll show you an example using React and Nginx. Those are often used together with Docker to construct and provide the files to the user with Nginx.
To get started, you'll need:
Check the list of the most used docker commands that you MUST know in order to do this tutorial.
Let's build the following React App. This project is a small React app that helps you bootstrap projects with React & Redux. I've used it for many months to get started on the development of React applications quickly. Some of the dependencies are outdated, but it should show you what you can achieve with Multi staging.
Start by cloning the repository. You can even clone your own React project and adapt the Dockerfiles shown below to get Docker Multi Staging working for your project.
git clone https://github.com/StanGirard/ReactBoilerplate && cd ReactBoilerplate
This react project uses a common Dockerfile to build and run the React app. It isn't optimized and is mostly used for development purposes.
Inside ReactBoilerplate
you'll find the following Dockerfile:
It is a reasonably common Dockerfile that copies the current folder inside the Docker, installs the necessary dependencies, and runs the application. It is, however, not optimized for production. If you have used your project, just put the Dockerfile at the root of your project.
We can optimize the size of the container by not copying all the files. In the ReactBoilerplate folder, you'll find a .dockerignore file containing:
This file tells the docker script COPY not to import these existing files. If you have more files or folders that you don't want to include in your image, don't forget to add them here. We do not copy node_modules because it is relatively big and will bloat our image. The build folder has no use for this in this Image and thus should not be included.
Want to know more about containers? Read this article.
Run the following command to build the container:
docker build -f Dockerfile -t react-b:latest .
You can run the container with the command:
Our end goal is to produce an optimized image for production purposes. The image that we currently created takes 369MB of storage and uses 242.9MB of rams while being idle. Docker Multi-stage building should help increase our image's performance and keep the size and ram usage down. If you run your images inside a Kubernetes Cluster, smaller images and a better RAM lead to more pods running on a fewer number of nodes.
Let's use the builder feature. It allows us to define multiple stages in our Docker build. If you use numerous FROM keywords in your Dockerfile, the latest one will be added to the newly created container.
First, we need to modify the Dockerfile to optimize it for production purposes. We will replace the CMD ["npm", "start"]
at the end of the docker file by RUN npm run build
and the RUN npm install —silent
by RUN npm install --silent--only=prod
This will create the folder build
with the optimized code for production.
Here is the modified Dockerfile
Now we need to add a stage to our Dockerfile. We want to run our optimized production application inside an Nginx container. Inside our Dockerfile, we need to add a stage using the Nginx image and copy all the Nginx files.
First of all, create a file called nginx.conf and paste this code into it:
Modify the existing file. Add the stage name after the as
keyword.
FROM node:13.12.0-alpine as builder
By changing this line above, we told Docker that this image is a stage. We will be able to reference it in other stages. We can copy specific files and folders from this image with the command COPY —from*=<stage_name>
.
Our Multi staging Dockerfile is starting to take shape. We need to add the Nginx stage, and we are done.
The COPY --from=builder
tells Docker to precisely copy the files inside the /app/build/
folder inside our Nginx container.
You can run the newly created container with:
The newly created container only uses 6.4MB of RAM while idle and only takes 32 MB of storage. It is a huge difference.
Memory Idle | Normal | Multi Staging |
---|---|---|
RAM | 242.9 MB | 6.4 MB |
Storage | 369 MB | 32 MB |
In the following tutorial, we just scratched the surface of what you can do with the Multi-staging features of Docker.
You can go further and implement many more features, such as:
In order to build your container and stop at a specific stage, you need to run this command:
docker build --target builder -f Dockerfile -t react-b:latest .
The --target
the option allows you to specify a specific stage at which you'd like to stop. It can be useful if your last step is only used in production or for testing. In this example, we stopped before copying to Nginx.
Let's say that you have many stages in your file and need to use a previously created stage for the following one. All you need to do is:
FROM builder as build2
This tells Docker to use the stage builder as a source for this stage. It can be used for running tests on a previous stage.
You can tell Docker to copy files from a docker image that is either locally available on a remote docker repository. To do so, all you need is:
COPY --from=nginx:latest /etc/nginx/nginx.conf /nginx.conf
Using multi-staged containers has many advantages, but some of its drawbacks may keep you away from this neat feature.
In our example above, our multi-staged container is more than ten times smaller than the original container. As we upload tiny containers to the cloud, the size difference can make a significant impact. Let's imagine that you upload your container to a repository such as ECR (Elastic Container Registry) from AWS.
A couple of minutes is not a considerable time to wait. But if your container needs multiple images to build, test, and run, it can significantly increase its size. Having only the container's running environment helps you decrease the image size and gain extra minutes and bandwidth.
If you are running a Kubernetes cluster, each pod needs to pull the image at every update; bigger images mean longer download time. If you have dozens of updates across hundreds of services at the end of the day, a reduced image size significantly decreases your services' downtime.
Making sure your images are secure is not an easy job. In an original container, you could end up with vulnerabilities brought by stages that you do not require in the build stage. By removing unnecessary dependencies and reducing the exposition of your image, you drastically reduce the security risks.
Bandwidth is relatively cheap on every cloud provider, so the cost difference between a big and small image is not that huge. However, if your containers are downloaded thousands of times across multiple regions, the cost can factor into your choice.
Sometimes, especially for applications, some libraries are needed to run the app. It can even be tricky to include those libraries while making the image as small as possible with a barebone image.
The advantages of a multi-staged build are numerous. However, it is only required in some specific use cases. In systems with automated CI/CD systems and multiple applications requiring docker images, the size difference can significantly decrease your roll-out times.