Docker packaging is an exercise in shoving square pegs into round holes, over and over and over again.
Consider the Poetry packaging tool for Python. One of Poetry’s features can make Docker rebuilds slower, by breaking Docker’s caching.
And it’s not a bad feature, there’s nothing really wrong with it, it just—doesn’t fit.
Let’s see what the problem is, go over some workarounds—which have their own problems, obviously—and then briefly consider why everything about Docker packaging is always slightly broken.
Note: Outside the very specific topic under discussion, the Dockerfiles in this article are not examples of best practices, since the added complexity would obscure the main point of the article.
To ensure you’re writing secure, correct, fast Dockerfiles, consider my Quickstart guide, which includes a packaging process and 60+ best practices.
Recap: faster rebuilds by installing dependencies separately
As a reminder:
- When you rebuild a Docker image it can use caching to speed up the rebuild process. The caching will be invalidated if you
COPYin a changed file.
- When installing your dependencies and code, you’ll therefore want to copy in the dependencies file first, and separately. This lets dependency installation can be sped up by caching even if your code changes.
For example, we copy
requirements.txt in first, and install dependencies using it, then
COPY in the rest of the code:
FROM python:3.8-slim-buster COPY requirements.txt /tmp RUN pip install -r requirements.txt COPY . /tmp/myapp RUN pip install /tmp/myapp
Let’s see how we do this two-step install with Poetry.
Poetry has two relevant files.
- The standard
pyproject.tomlPython config file with Poetry-specific configuration has your high-level dependencies.
poetry.lockcontains pinned versions of all transitive dependencies.
We’ll have to copy them both in:
FROM python:3.8-slim-buster WORKDIR /app # Install poetry: RUN pip install poetry # Copy in the config files: COPY pyproject.toml poetry.lock ./ # Install only dependencies: RUN poetry install --no-root --no-dev # Copy in everything else and install: COPY . . RUN poetry install --no-dev
So far, so good: unless our dependencies change, thereby changing
poetry.lock, Docker image rebuilds will be able to use cached layers because the two copied files won’t have changed.
But there’s a problem.
pyproject.toml: more than just dependencies
As mentioned above,
pyproject.toml is where you list dependencies when you’re using Poetry. Let’s take a look at an example:
[tool.poetry] name = "myexample" version = "0.1.0" description = "" authors = ["Itamar Turner-Trauring"] [tool.poetry.dependencies] python = "^3.6" Flask = "^1.1.2" # ...
Do you spot the problem?
- There’s a
versionfield for your application.
- Every time you update that
- This invalidates the Docker cache when you rebuild your image.
- As a result, your Docker build has to install all your dependencies, slowing things down.
Now, quite possibly you only update that field infrequently, and you can live with occasional slow rebuilds. But if you’re doing some sort of continuous deployment process where you’re continuously updating the version field, your Docker builds are going to be slow.
First, as mentioned above, you can choose not to care.
Second, instead of installing dependencies with Poetry, you can install them with
pip. Specifically, you can use
poetry export to create a standalone
requirements.txt, and then just copy the
requirements.txt in instead of
The downside is that you need Poetry installed both in and outside the Docker image in your CI build, and this isn’t quite how Poetry normally installs.
Third, you can use
poetry-dynamic-versioning, a plug-in for Poetry that uses Git tags instead of
pyproject.toml to set your application’s version. That way you won’t have to edit
pyproject.toml to update the version.
This seems appealing until you realize you now need to copy
.git into your Docker build, which has its own downsides, like larger images unless you’re using multi-stage builds.
Fourth, this is conceivably something Poetry could fix. The problem is that
pyproject.toml serves multiple purposes: versions, dependencies, and more. Unlike a full install, however, for the purpose of installing dependencies you probably only need
poetry.lock, so Poetry could support installing just with that.
I considered filing an issue, but there are already hundreds of issues in the tracker and I felt a little bad.
Why is everything broken?
A consistent theme with Docker packaging is that nothing works quite right. Docker packaging interacts badly with everything from Unix signals—a 50-year-old technology!—to quite recent projects like Poetry.
So why is that? Partially, it’s because these technologies have their own issues. For example, the interaction of Unix signals, shells, and terminals is extremely complex to the point where I immediately forget how it works every time I attempt to (re)learn it.
But the problem with Poetry is arguably down to the way Docker’s build works:
Dockerfiles are essentially glorified shell scripts, and the build system semantic units are files and complete command runs. There is no way in a normal Docker build to access the actually relevant semantic information: in a better build system, you’d only re-install the changed dependencies, not reinstall all dependencies anytime the list changed.
Hopefully someday a better build system will eventually replace the Docker default. Until then, it’s square pegs into round holes.