Understanding the Docker cache
One of the main points of confusion when building images is understanding how the Docker layers work.
Each of the commands on a Dockerfile is executed consecutively and on top of the previous layer. If you are comfortable with Git, you'll notice that the process is similar. Each layer only stores the changes to the previous step:
This allows Docker to cache quite aggressively, as any layer before a change is already calculated. For example, in this example, we update the available packages with apk update, then install the python3 package, before copying the example.txt file. Any changes to the example.txt file will only execute the last two steps over layer be086a75fe23. This speeds up the rebuilding of images.
It also means that you need to construct your Dockerfiles carefully to not invalidate the cache. Start with the operations that change very rarely, such as installing the project dependencies, and finish with the ones that change more often, such as adding your code. The annotated Dockerfile for our example has indications about the usage of the cache.
This also means that an image will never get smaller in size, adding a new layer even if the layer removes data, as the previous layer is still stored on the disk. If you want to remove cruft from a step, you'll need to do so in the same step.
There's another practical consideration. Containers are a great tool to simplify and reduce your service to the minimum. With a bit of investment, you'll have great results and keep small and to-the-point containers.
There are several practices for keeping your images small. Other than being careful to not install extra elements, the main ones are creating a single, complicated layer that installs and uninstalls, and multi-stage images. Multi-stage Dockerfiles are a way of referring to a previous intermediate layer and copying data from there. Check the Docker documentation (https://docs.docker.com/develop/develop-images/multistage-build/).
You can learn more about the differences between the two strategies in this article: https://pythonspeed.com/articles/smaller-python-docker-images/.
We'll create a multi-stage container in the next step.