A Docker Stack for Personal and Team Projects in Python --- Part 1
I started using Docker in early 2016. After learning it for a month or two, I have never done code development outside of Docker again. My Docker workflow has evolved over time. The main drivers have been designing for teamwork. So far I have had three major design rounds, which happened in early 2018 (for one team), early 2019 (for another team), and late 2020 (major simplifying overhaul). By now, I feel the stack has reached a relatively stable and good stage (as I have felt previously!), so I decided to write down the main ideas of it. The stack for my personal experiments are available on Github, which will be referred to in the post. The principles are the same as those for a team stack.
To start off, what are the design goals for a Docker workflow? In hindsight, I think these two are the most important:
- It must be very easy to use.
- It must enforce or at least encourage standard, good practices, as opposed to homegrown, nonstandard ways that “also work” for the immediate task.
What is “easy to use”? These characteristics come to mind:
- It must automate everything that can be automated. Moreover, user is better off not seeing the things that are automated away. For example, user should not need to copy a 100-line YAML config file just to customize two lines of it.
-
It must be fairly stable. You can not announce every few weeks, “Hi team, a new version of our Docker script is pushed. Please upgrade it on every machine and project where it is used.”
However, it is nearly impossible to be absolutely bug free, and downright impossible to foresee every future need. We do need to make fixes and small changes from time to time. The design must allow some evolution, yet does not require user to upgrade in order to get the new functionalities. What? Well, it’s like a service with a stable API.
- The user interface must be minimal, that is, the user needs to remember only one or two commands with one or two options.
How can we achieve such automation and simplicity? As the stack evolved, it ended up with two techniques underpinning most of the solutions:
- Give Docker images auto-generated, date-time based, sortable versions (or “tags” in Docker nomenclature). In most cases (or default cases), we would want to use the latest version of an image. With this versioning scheme, the latest version is easily found by code.
-
Use a utility image to host scripts, but run the scripts outside of their hosting images. Suppse a Bash script is contained in a string
${script}
, then we can run it this way1
$ bash -c "${script}" -- [args]
where
[args]
are any arguments the script takes. It follows immediately that if imageabc:20210421
has a scriptscript.sh
in/usr/local/bin
, we can use the script outside of Docker as follows:1 2
$ cmd="$(docker run --rm abc:20210421 cat /usr/local/bin/script.sh)" $ bash -c "${cmd}" -- [args]
This is a critical little trick that makes the stack stable to its user, yet extensible in its behavior. We’ll see how it works soon.
On the high level, the Docker stack consists of these main components:
- Some requirements and assumptions about the directory layout on the host machine and inside the project repo. These requirements are easy to satisfy and are not restrictive in terms of capabilities.
- A Docker image called
tiny
. This image contains a few commands that are expected to be very stable, such that other utilities refer to this image with a hard-coded tag (w/o worrying about updating it often), and use the commands contained in it. In particular, this image contains a command that finds the latest tag of a specified Docker image. Thanks to the sortable image versions, this script is stable. - A Docker image called
mini
. This image contains additional commands for building project images and running project containers. The command internals may change from time to time, but their names should not change, and their user interfaces should be stable. Now,tiny
andmini
form something like a cache hierarchy. User code will use the very stabletiny
to find, dynamically, the latest version ofmini
, and copy commands out ofmini
to use. ( I came to consciously, extensively use this pattern only in the late 2020 iteration.) - Some base images for all projects to build on.
- A project template for new projects to copy and start from. This repo maintains the single source of truth for things like the code structure in a project repo, location of the Dockerfile, the build script, etc.
Next, I’ll describe some details of tiny
and the base images.
By the way, since tiny
and mini
are used in many places, and a newer version of mini
may be downloaded anytime automatically, these images should be as small as possible. Noticing that both images are shell-script only, and do not even require Bash shell, they are built on the very smallest base image, busybox
. In fact, both images are below 1.5 Mb in size.
The image tiny
contains two types of commands. The first type of commands generate version strings to be used by image building scripts. Specifically, there are two commands for two variants of the versioning scheme:
1
2
3
4
5
$ docker run --rm zppz/tiny:21.01.02 make-date-version
21.04.25
$ docker run --rm zppz/tiny:21.01.02 make-datetime-version
20210425-233226
The date-based versions are recommended for utility libraries that are not released so often. The datetime-based versions are recommended for “product projects”, which may have frequent builds.
Note that I used the exact tiny
image zppz/tiny:21.01.02
. User scripts can do the same thing because the image is very stable.
The second type of commands concern finding the latest tag of a specified Docker image. There are two commands of this type, namely /usr/tools/find-image
and /usr/tools/find-local-image
. They are designed to run outside of Docker. For example,
1
2
3
4
5
6
7
8
9
$ cmd="$(docker run --rm zppz/tiny:21.01.02 cat /usr/tools/find-image)"
$ bash -c "${cmd}" -- zppz/mini
zppz/mini:21.04.25
$ cmd="$(docker run --rm zppz/tiny:21.01.02 cat /usr/tools/find-local-image)"
$ bash -c "${cmd}" -- zppz/mini
zppz/mini:21.04.25
This is also how they are intended to be used in user scripts. Whereas find-image
finds the latest between local and a remote image registry, find-local-image
finds the latest on the current machine only.
The base images contain some basic and common stuff so that project repos have a common baseline, and don’t need to repeat the same setup. As of now, the images contain
- A carefully chosen base, currently
ubuntu:20.10
. Some considerations in this choice include the distro’ default Python version (3.8), compatibility withcuda
, compatibility with the team’s build environment—is it in the Debian lineage or CentOS lineage?—etc. (For a team,ubuntu:20.04
might be preferable as it is a “Long Term Support” version.) - Non-root user account
docker-user
, in groupdocker-user
, with home directory/home/docker-user
. It is the intention that downstream images always run as this user. - Very basic Linux packages such as
curl
,unzip
, etc. Note, the base images should strike a balance between light weight and usefulness. For example, I do not recommend havingvim
andgit
in there, because it is not recommened to develop code within a container (it is rather for executing code). - Python 3.8.
- A few Python packages related to testing (
pytest
) and debugging (pudb
). - A better Python REPL called
ptpython
. - Jupyter notebook package.
- Nice configuration for the things installed, such as informative Bash prompt,
ls
coloring, Jupyter behavior, integration betweenpudb
andpytest
, etc.
That’s all for this post. Please read the subsequent parts:
- Part 1: overview (this article).
- Part 2: building images.
- Part 3: running containers.
- Part 4: building images for a separately installable Python package. (In contrast, the images built in “Part 2” do not provide a separately installable package. In order to use the Python package therein, one needs to use these images as base image.)