A Poor Man's Continuous Deployment System Using Docker
I’ve implemented a “continuous deployment” (CD) system centered on Docker. I have not studied any standard approaches to CD. In this “system”, automated updates of things are not triggered by events such as code commit, but rather by scheduled cron jobs. There is no test suits to be run automatically—this is not because I do not consider testing to be essential, but rather is aligned with the reality of my (team’s) code. On the other hand, the type of deployed code that is of concern here is mostly batch-mode pipelines, for which a crash in production is not the end of the world. There is no doubt that my solution is crude, but it is practical. For its purpose, it does a pretty good job.
The setting
Suppose I and a small team develop back-end data pipelines.
All of our code is hosted on Github under organization my-org
.
We have three repos:
infra
: this is a general framework/utility library that our other work uses.app1
: this is a particular work component, which makes use ofinfra
.my-docker
: this contains Docker image definitions and related tools forinfra
,app1
(and other repos in a position similar to that ofapp1
).
All the code is in Python.
There is a top-level directory in repo app1
, named scripts
, that contains pipeline scripts that are deployed in production.
Our Docker images are stored in AWS Elastic Container Service (ECS).
There are several goals for this automation system:
- Docker images are automatically re-built when appropriate. There are three occasions that demand image rebuild:
- Image definition in
my-docker
has changed. - Some code in
infra
orapp1
has changed, and the code is to be “sealed” in an image. - An upstream image has been re-built, requiring the rebuild of a downstream image.
- Image definition in
- On a development machine (or deployment machine), simple commands are defined for launching Docker images. They automatically launch the latest version that exists in ECS. If the latest version is already present on the local machine, the launch is fast. Otherwise, downloading will happen, which takes some time.
- Once new code is merged into the
develop
branch of the repoapp1
, deployed pipelines inapp1/scripts/
, triggered by cron, automatically run in a properly rebuilt image that contain the new code.
The my-docker
repo
This repo started like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
my-docker/
|-- py3dev/
| |-- Dockerfile
| |-- build.sh
|-- infra-prod/
| |-- Dockerfile
| |-- build.sh
|-- infra-dev/
| |-- Dockerfile
| |-- build.sh
|-- app1-prod/
| |-- Dockerfile
| |-- build.sh
|-- app1-dev/
| |-- Dockerfile
| |-- build.sh
|-- build.sh
|-- run-docker
|-- common.sh
|-- pipeline
|-- pyscript
|-- auto-build
|-- auto-build.sh
Some explanation of the relations between our images is in order.
py3dev
is the base of them all.
It contains Python 3.6, some common Python packages, Jupyter
notebook, utilities, a nice editor (neovim
, that is), and so on.
In addition, it has a user account docker-user
. All code execution in our containers is via this user.
infra-dev
is used for developing the code in repo infra
. Importantly, it does not contain infra
code,
but rather volume-map the infra
code directory on the local machine into the container, so that code changes can happen both from insider and outside of the container, and test run always uses the latest in-development code.
If infra
depends on some third-party libraries, they are installed in infra-dev
. This image is based on py3dev
.
infra-prod
is the ‘production’ or ‘deployed’ image of infra
. It is based on infra-dev
(which contains third-party dependencies of infra
) and, importantly, contains, say, the develop
branch of the infra
code. The infra
code sealed in the infra-prod
image is read-only, appropriate for production use.
app1-dev
is used for developing the code in repo app1
. It contains third-party dependencies of app1
,
is based on infra-prod
, and volume-maps the source code of app1
from the local machine into the container.
app1-prod
is based on app1-dev
. In addition to the dependences installed in app1-dev
, it contains a read-only copy of the app1
code.
Building the images
Let’s not worry about auto-build for now.
Let’s use build.sh
to build all the images.
build.sh
is simple. Below is the entire script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
set -Eeuo pipefail
# The sole optional argument is 'push'.
thisfile="${BASH_SOURCE[0]}"
thisdir="$( cd $( dirname ${thisfile} ) && pwd )"
(cd "${thisdir}"/py3dev && bash build.sh $@)
repos=( infra app1 )
for repo in "${repos[@]}"; do
(cd "${thisdir}"/${repo}-dev && bash build.sh $@)
(cd "${thisdir}"/${repo}-prod && bash build.sh $@)
done
The build.sh
scripts in the image-specific directories are all similar.
They all “source” in the common utility file common.sh
, and take one optional argument
indicating whether to push the images to AWS ECS.
The entire py3dev/build.sh
is as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
set -Eeuo pipefail
# The sole optional argument is 'push'.
thisfile="${BASH_SOURCE[0]}"
thisdir="$( cd "$( dirname "${thisfile}" )" && pwd )"
parentdir="$( dirname "${thisdir}" )"
source "${parentdir}/common.sh"
PARENT_IMAGE=python:3.6-slim-stretch
cp -f "${parentdir}/pipeline" "${thisdir}/"
cp -f "${parentdir}/pyscript" "${thisdir}/"
build_simple "${thisdir}" "${PARENT_IMAGE}" py3dev $@
rm -f "${thisdir}/pipeline" "${thisdir}/pyscript"
The script calls the function build_simple
the is defined in common.sh
and takes one optional argument push
.
As for the scripts pipeline
and pyscript
, we’ll get to them alter.
Here is the script infra-dev/build.sh
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
set -Eeuo pipefail
# The sole optional argument is 'push'.
thisfile="${BASH_SOURCE[0]}"
thisdir="$( cd "$( dirname "${thisfile}" )" && pwd )"
parentdir="$( dirname "${thisdir}" )"
source "${parentdir}/common.sh"
parent_name=py3dev
parent_version=$(pull_aws_latest ${parent_name})
parent_image=${parent_name}:${parent_version}
build_dev "${thisdir}" "${parent_image}" $@
The commands pull_aws_latest
and build_dev
are defined in common.sh
.
The former pulls the latest version of the specified image from AWS ECS (if not already present on the local machine) and returns its tag. The tag is used to specify the parent image for the image being built.
Here is the script ‘infra-prod/buil.sh’:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
set -Eeuo pipefail
# Sole optional argument is 'push'.
thisfile="${BASH_SOURCE[0]}"
thisdir="$( cd "$( dirname "${thisfile}" )" && pwd )"
parentdir="$( dirname "${thisdir}" )"
source "${parentdir}/common.sh"
parent_name=infra-dev
parent_version=$(pull_aws_latest ${parent_name})
parent_image=${parent_name}:${parent_version}
build_prod "${thisdir}" "${parent_image}" $@
Note that it specifies infra-dev
as the parent image.
build_branch
is another function defined in common.sh
(we’ll get to it).
The script app1-dev/build.sh
is identical to infra-dev/build.sh
except that
parent_name
in the script is defined to be infra-prod
.
The script app1-prod/build.sh
is identical to infra-prod/build.sh
except that
parent_name
in the script is defined to be app1-dev
.
Utility functions in common.sh
Now it’s time to demystify the functions defined in common.sh
.
We start with the few functions related to finding the latest version of an image on AWS ECS, and on the location machine.
Note that our images on ECS are located at my-org/py3dev
, my-org/infra-dev
, my-org/infra-prod
,
my-org/app1-dev
, and my-org/app1-prod
at the address pointed to by $ECR_URL
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
function log_in_aws {
$(aws ecr get-login --no-include-email --region ${AWS_DEFAULT_REGION}) > /dev/null 2>&1
}
# Find latest tag of specified repo AWS.
# This assumes all tag names are already sort-able, so
# this does not get 'latest' by timestamp, rather just by tag.
function find_aws_latest_tag {
log_in_aws
name="my-org/$1"
z=$(aws ecr list-images \
--repository-name ${name} \
--query 'sort_by(imageIds,& imageTag)[-1].imageTag')
# Remove quotes.
z=${z#*\"}
z=${z%\"*}
echo "$z"
}
# Find latest tag of specified repo on the local machine.
# This assumes all tag names are already sort-able, so
# this does not get 'latest' by timestamp, rather just by tag.
function find_latest_tag {
docker images "$1" --format "{{.Tag}}" | sort | tail -n 1
}
After some experiments, we settled on tagging images by the UTC timestamp of its build time in this format:
20180923T082316Z
. This records the full year, month, day, hour, minute, second, in UTC.
With this naming convention, we are able to which image is the newest solely by its tag, ignoring its created/built/pushed-at timestamps.
(At first, we retrieved the tags from AWS ECS with the latest pushedAt
timestamp, and found the largest tag among them. We tripped on a tricky corner case. At one point, we made a small change and pushed a new image.
Then reverted that change, re-built, re-tagged, and re-pushed. This resulted in the latest tag associated with the
second-to-latest pushedAt
time. Then, by restricting our search to the latest pushedAt
tags, we could not find the truely latest tag; the our code had the wrong idea about whether the AWS iamge is up-to-date.
Since then, we’ve switched to this universal-tag-search approach.)
Part of the motivation for this format is that Github returns a repo’s commit time in this format. Using the same format eases comparison (see below).
Below we’ll see that the code constructs the tag in this format when building an image.
Next we look at the functions that deal with pulling/pushing images from/to AWS ECS.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Pull from AWS and tag it.
function pull_aws {
log_in_aws
name="$1"
version="$2"
set -x
docker pull "${ECR_URL}/my-org/${name}:${version}" >&2
docker tag "${ECR_URL}/my-org/${name}:${version}" "${name}:${version}"
set +x
}
# Find the latest version on AWS.
# Check if it's present locally.
# If not, pull it.
# Return the tag.
function pull_aws_latest {
name="$1"
version="$(find_aws_latest_tag ${name})"
if [[ $(docker images "${name}:${version}" -q) == "" ]]; then
echo Latest version of "my-org/${name}" on AWS is "${version}" >&2
echo Could not find "${name}:${version}" locally\; pulling from AWS... >&2
pull_aws "${name}" "${version}"
fi
echo "${version}"
}
function push_to_aws {
name="$1"
version="$2"
log_in_aws
tag=${ECR_URL}/my-org/${name}:${version}
set -x
docker tag ${name}:${version} ${tag}
docker push ${tag} >&2
set +x
}
Note that when we push a local image to AWS or pull an AWS image to local machine, we make sure the local and AWS images have the same tag.
In the build.sh
scripts, we call build_dev
and build_prod
to build development and production images, respectively. The difference between these functions is that the latter takes care of downloading the develop
branch of code from Github, unpacking it in the current directory (the Docker build context). Subsequenctly,
the Dockerfile is responsible for copying that code into the image, building and installing as appropriate.
The image py3dev
does not have a development
vs production
distinction. It is built by the function build_simple
.
Below is the code.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
function build_simple {
build_dir="$1"
PARENT_IMAGE="$2"
NAME="$3"
if [[ $# > 3 ]]; then
push="$4"
else
push=""
fi
VERSION="$(date -u +%Y%m%dT%H%M%SZ)"
# UTC. This command with these arguments work the same on Mac and Linux.
# Version format is like this:
# 20180923T081243Z
# which indicates full datetime accurate to seconds in UTC.
# This format is chosen to facilitate comparison with Github commit time;
# see 'get_lastest_branch_commit_date' in 'auto-build.sh'.
FULLNAME="${NAME}:${VERSION}"
set -x
# docker build --build-arg PARENT_IMAGE="${PARENT_IMAGE}" -t "${FULLNAME}" --no-cache "${build_dir}"
# TODO: remove '--no-cache' once things work fine.
docker build --build-arg PARENT_IMAGE="${PARENT_IMAGE}" -t "${FULLNAME}" "${build_dir}" >&2
set +x
echo
if [[ "${push}" == "push" ]]; then
push_to_aws $NAME $VERSION
fi
}
function build_dev {
build_dir="$1"
parent_image="$2"
image_name="$(basename ${build_dir})"
if [[ "${image_name}" != *-dev ]]; then
echo "'build_dev' called from directory '"${image_name}"'"
return 1
fi
shift
shift
# optional 3rd argument is 'push'.
build_simple ${build_dir} ${parent_image} ${image_name} $@
}
function build_prod {
build_dir="$1"
parent_image="$2"
image_name="$(basename ${build_dir})"
if [[ "${image_name}" != *-prod ]]; then
echo "'build_prod' called from directory '"${image_name}"'"
return 1
fi
shift
shift
# otional 3rd argument is 'push'
repo="${image_name%-prod}"
branch=develop
URL=https://github.com/my-org/${repo}/archive/${branch}.zip
GIT_TOKEN="Authorization: token ${GITHUB_ACCESS_TOKEN}"
rm -rf "${build_dir}/src.zip" "${build_dir}/${repo}-develop" "${build_dir}/src"
curl -skL --retry 3 -H "${GIT_TOKEN}" ${URL} -o "${build_dir}/src.zip"
(cd "${build_dir}" && unzip src.zip && mv -f ${repo}-${branch} src)
rm -f "${build_dir}/src.zip"
build_simple "${build_dir}" "${parent_image}" "${image_name}" $@
rm -rf "${build_dir}/src"
}
Run Docker containers
Having built the images, let’s develop a script to launch Docker containers based on the images, while taking care of any setup about the container that can be automated. Some main concerns include:
- Set up environment variables that should be available to programs running inside the container.
- Set up volumn mapping between the host machine and the container. For example, a “development” container needs to have source code mapped into the container; for “production” containers, it’s important to provide mapping for data, config, and logging directories.
- Check the version of the image on the local machine as well as in AWS ECS, and pull the latest from AWS as needed.
- Provide support for running pipelines in a production container, such as take care of logging file rotation.
- Cater to the need of particular commonly-used programs.
The script is named run-docker
. The general pattern of its usage is like this:
1
run-docker [docker-options] image-name command [args]
The content of run-docker
is as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
#!/usr/bin/env bash
thisfile="${BASH_SOURCE[0]}"
thisdir="$( cd "$( dirname "${thisfile}" )" && pwd )"
source "${thisdir}/common.sh"
# set -Eeuo pipefail
set -o errexit
set -o nounset
set -o pipefail
# For all the directory and file names touched by this script,
# space in the name is not supported.
# Do not use space in directory and file names in ${HOME}/work and under it.
USAGE=$(cat <<'EOF'
Usage:
run-docker [--local] image-name [command [...] ]
run-docker [--local] image-name pipeline py-script-name [...]
run-docker [--local] image-name pyscript py-script-name [...]
where
`image-name` is like 'py3dev', 'infra-dev', 'infra-prod', 'app1-dev', etc.
`command` is the command to be run within the container, followed by arguments to the command.
(Default: /bin/bash)
`py-script-name`: name of the Python script, including the extension '.py'.
This script must reside directly under the 'script' folder in the project's repository.
`...`: additional arguments for `command` or `py-script-name`.
If `--local` is present, use local image and do not try to pull the latest from AWS.
All Docker options appear before `image-name`; after `image-name` are command and it arguments.
EOF
)
imagename=""
uselocal="no"
command=/bin/bash
args=""
opts=""
# Parse arguments. After mandatory arguments are obtained,
# remaining arguments are stored, to be passed on.
while [[ $# > 0 ]]; do
if [[ "${imagename}" == "" ]]; then
if [[ "$1" == '--local' ]]; then
uselocal="yes"
else
imagename="$1"
fi
shift
else
# After `image-name`.
command="$1"
shift
args="$@"
break
fi
done
if [[ "${imagename}" == "" ]]; then
echo "${USAGE}"
exit 1
fi
# Check whether the latest version of the Docker image on AWS is available locally.
# If not, pull it from AWS and tag appropriately.
if [[ "${uselocal}" == "no" ]]; then
imageversion=$(pull_aws_latest "${imagename}")
echo using latest image "${imagename}:${imageversion}" in sync with AWS
else
imageversion=$(find_latest_tag ${imagename})
echo using latest local image "${imagename}:${imageversion}"
fi
projname="${imagename%%-*}"
# Delete longest match of "-*" from back of string.
variantname="${imagename#*-}"
# Delete shortest match of "*-" from front of string.
# Value is 'dev' or 'prod'.
if [[ "${command}" == "pipeline" ]]; then
if [[ $# < 1 ]]; then
echo "${USAGE}"
exit 1
fi
scriptname="$1"
fi
if [[ $(uname) == Linux && $(id -u) != 1000 ]]; then
# Linux box.
uid=$(id -u)
dockeruser=${uid}
opts="${opts} -e USER=${dockeruser} -u ${dockeruser}:docker -v /etc/group:/etc/group:ro -v /etc/passwd:/etc/passwd:ro"
else
dockeruser='docker-user'
opts="${opts} -e USER=${dockeruser} -u ${dockeruser}"
fi
dockerhomedir='/home/docker-user'
dockerworkdir="${dockerhomedir}/work"
hostworkdir="${HOME}/work"
workdir="${dockerworkdir}"
if [[ "${variantname}" == "dev" ]]; then
SRCDIR="src/${projname}"
opts="${opts} -v ${hostworkdir}/${SRCDIR}:${dockerworkdir}/${SRCDIR}"
opts="${opts} -e SRCDIR=${dockerworkdir}/${SRCDIR}"
opts="${opts} -e PYTHONPATH=${dockerworkdir}/src/${projname}"
opts="${opts} -e SCRIPTDIR=${dockerworkdir}/src/${projname}/scripts"
workdir="${dockerworkdir}/${SRCDIR}"
else
opts="${opts} -e SCRIPTDIR=/usr/local/bin/my-org/${projname}"
fi
if [[ "${command}" == "notebook" ]]; then
opts="${opts} --expose=8888 -p 8888:8888"
workdir="${dockerworkdir}/tmp"
command="jupyter notebook --port=8888 --no-browser --ip=0.0.0.0 --NotebookApp.notebook_dir='${workdir}' --NotebookApp.token=''"
elif [[ "${command}" == "py.test" ]]; then
args="-p no:cacheprovider ${args}"
elif [[ "${command}" == "pipeline" ]]; then
: # do nothing, but stay away from '-it'
else
opts="${opts} -it"
fi
opts="${opts}
-e HOME=${dockerhomedir}
--workdir ${workdir}
-e IMAGE_NAME=${imagename}
-e IMAGE_VERSION=${imageversion}
-e TZ=America/Los_Angeles
--rm --init"
LOGDIR=log/"${imagename}"
if [[ "${command}" == "pipeline" ]]; then
LOGDIR="${LOGDIR}/${scriptname}"
fi
mkdir -p "${hostworkdir}/${LOGDIR}"
opts="${opts} -v ${hostworkdir}/${LOGDIR}:${dockerworkdir}/${LOGDIR}"
opts="${opts} -e LOGDIR=${dockerworkdir}/${LOGDIR}"
DATADIR="data/${imagename}"
mkdir -p "${hostworkdir}/${DATADIR}"
opts="${opts} -v ${hostworkdir}/${DATADIR}:${dockerworkdir}/${DATADIR}"
opts="${opts} -e DATADIR=${dockerworkdir}/${DATADIR}"
CFGDIR="config/${imagename}"
mkdir -p "${hostworkdir}/${CFGDIR}"
opts="${opts} -v ${hostworkdir}/${CFGDIR}:${dockerworkdir}/${CFGDIR}"
opts="${opts} -e CFGDIR=${dockerworkdir}/${CFGDIR}"
TMPDIR="tmp"
mkdir -p "${hostworkdir}/${TMPDIR}"
opts="${opts} -v ${hostworkdir}/${TMPDIR}:${dockerworkdir}/${TMPDIR}"
opts="${opts} -e TMPDIR=${dockerworkdir}/${TMPDIR}"
#set -x
docker run ${opts} ${imagename}:${imageversion} ${command} ${args}
There are quite a few things going on here, and I’m not going to wark through it. (It is somewhat simplified from the actual version I use, but all the patterns are here.) It will be helpful to know that I assume the project directory structure on the development machine follows the recommendations here.
This script gives special attention to two commands, namely pipeline
and pyscript
.
They are installed in the image py3dev
as shown in py3dev/build.sh
.
I’ll talk about them next.
The special commands pipeline
and pyscript
Suppose there is a script app1/scripts/do-something.py
. pipeline
is a command installed into
/usr/local/bin
in image py3dev
. Its intended use is like this:
1
run-docker app1-prod pipeline do-something.py [...args...]
Typically this is launched as a cron job. Note that pipeline
is a command residing inside the container,
on the standard command path, hence this is Docker’s standard way of launching a command inside the container
without first landing on an interactive console in the container. How does pipeline
find do-something.py
?
Well, it assumes the script is in the directory pointed to by $SCRIPTDIR
, which is set up by run-docker
.
The build spec app1-prod/Dockerfile
make sure to copy (or “install”) that script into the correct location.
In run-docker
, notice that $SCRIPTDIR
points to different locations depending on whether the image is
*-dev
or *-prod
.
pipeline
handles logging and adds some contextual info
to the log (things like “xxx is starting with arguments … at 2018-09-28 01:02:03 PST …”).
In contrast, pyscript
is intended for one-off use. It does not handle logging; any printout of the program
appears in the console directly. pyscript
also finds do-something.py
via $SCRIPTDIR
.
It is used in the same way:
1
run-docker app1-prod pyscript do-something.py [...args...]
Below is the script pipeline
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
USAGE=$(cat <<'EOF'
This script resides within a Docker image, and calls a Python program
in `${SCRIPTDIR}`.
Usage:
pipeline task [args]
`task` indicates the Python script that is to be run.
For example, if `task` is `abc.py`, then `${SCRIPTDIR}/abc.py` will be run.
`args` are optional arguments to be passed on to the Python script.
EOF
)
if [[ $# < 1 ]]; then
echo "${USAGE}"
exit 1
fi
if [[ $# < 1 ]]; then
echo "${USAGE}"
exit 1
fi
task="$1"
shift
: "${SCRIPTDIR:?Environment variable \'SCRIPTDIR\' is not set}"
: "${LOGDIR:?Environment variable \'LOGDIR\' is not set}"
: "${DATADIR:?Environment variable \'DATADIR\' is not set}"
: "${IMAGE_NAME:?}"
: "${IMAGE_VERSION:?}"
PYSCRIPT="${SCRIPTDIR}"/${task}
echo | multilog s1000000 n30 "${LOGDIR}"
echo | multilog s1000000 n30 "${LOGDIR}"
echo "========================================" | multilog s1000000 n30 "${LOGDIR}"
date --utc +'%Y-%m-%d %H:%M:%S UTC' | multilog s1000000 n30 "${LOGDIR}"
echo in Docker image "${IMAGE_NAME}:${IMAGE_VERSION}" | multilog s1000000 n30 "${LOGDIR}"
echo "$(env)" | multilog s1000000 n30 "${LOGDIR}"
echo | multilog s1000000 n30 "${LOGDIR}"
echo starting task \`${task}\` | multilog s1000000 n30 "${LOGDIR}"
if [[ "${task}" != *.py ]]; then
echo Unrecognized script "'${task}'" --- did you forget the extension? | multilog s1000000 n30 "${LOGDIR}"
echo Aborting... | multilog s1000000 n30 "${LOGDIR}"
exit 1
fi
echo python -u ${PYSCRIPT} $@ | multilog s1000000 n30 "${LOGDIR}"
echo "----------------------------------------" | multilog s1000000 n30 "${LOGDIR}"
echo | multilog s1000000 n30 "${LOGDIR}"
python -u ${PYSCRIPT} $@ 2>&1 | multilog s1000000 n30 "${LOGDIR}"
echo | multilog s1000000 n30 "${LOGDIR}"
echo "----------------------------------------" | multilog s1000000 n30 "${LOGDIR}"
date --utc +'%Y-%m-%d %H:%M:%S UTC' | multilog s1000000 n30 "${LOGDIR}"
echo task \`${task}\` finished | multilog s1000000 n30 "${LOGDIR}"
echo "========================================" | multilog s1000000 n30 "${LOGDIR}"
The log rotation tool multilog
(see
Simple Rotating Log Capture)
is installed in py3dev/Dockerfile
.
The script pyscript
is slightly simpler since it does not worry about logging:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
USAGE=$(cat <<'EOF'
This script resides within a Docker image, and calls a Python program
in `${SCRIPTDIR}`.
Usage:
py-script task [args]
`task` indicates the Python script that is to be run.
For example, if `task` is `abc.py`, then `${SCRIPTDIR}/abc.py` will be run.
`args` are optional arguments to be passed on to the Python script.
EOF
)
if [[ $# < 1 ]]; then
echo "${USAGE}"
exit 1
fi
task="$1"
shift
: "${SCRIPTDIR:?Environment variable \'SCRIPTDIR\' is not set}"
PYSCRIPT="${SCRIPTDIR}"/${task}
if [[ "${task}" != *.py ]]; then
echo Unrecognized script "'${task}'" --- did you forget the extension?
echo Aborting...
exit 1
fi
python -u ${PYSCRIPT} $@
Below is the segment of py3dev/Dockerfile
that installs pipeline
and pyscript
:
1
2
3
4
COPY pipeline /usr/local/bin/
RUN chmod +x /usr/local/bin/pipeline
COPY pyscript /usr/local/bin/
RUN chmod +x /usr/local/bin/pyscript
In app1-prod/Dockerfile
, the scripts under app1/scripts/
are copied into /usr/local/bin/my-org/app1
,
as is hinted at in run-docker
.
Auto-build the images
Now how to build and use the images is behind us, we can finally turn to auto build the images. The main points of the idea are
- Building images need to use the latest code of the repo
my-docker
. A simple way to do this in a cron job is by downloading the latest code everytime the cron job runs. - Provide a script in
my-docker
that does version checking, image building, and all that. - Provide a more stable, simple script that handles downloading the code of
my-docker
. This script is what is called by cron.
The downloader is simple as the following:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/usr/bin/env bash
thisfile="${BASH_SOURCE[0]}"
thisdir="$( cd $( dirname ${thisfile} ) && pwd )"
export GITHUB_ACCESS_TOKEN="fill this out in deployed copy"
: "${GITHUB_ACCESS_TOKEN:?}"
# Also make sure other environment variables required by `common.sh` are defined,
# possibly by 'sourcing' in a file containing definition of the variables.
echo
echo ======================================================
echo $(date)
echo fetching the latest of \'my-docker:master\'...
echo
URL=https://github.com/my-org/my-docker/archive/master.zip
GIT_TOKEN="Authorization: token ${GITHUB_ACCESS_TOKEN}"
cd /tmp
rm -f my-docker.zip
curl -skL --retry 3 -H "${GIT_TOKEN}" ${URL} -o my-docker.zip
rm -rf my-docker-master
unzip my-docker.zip
echo
echo ----------------------------------------
echo finished fetching \'my-docker:master\'
echo starting to build images...
echo
cd my-docker-master
bash auto-build.sh
echo
echo finished building images.
echo $(date)
echo -----------------------------------------
The heavy-lifter auto-build.sh
is listed below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
set -Eeuo pipefail
thisfile="${BASH_SOURCE[0]}"
thisdir="$( cd $( dirname ${thisfile} ) && pwd )"
source "${thisdir}/common.sh"
: "${GITHUB_ACCESS_TOKEN:?}"
GIT_TOKEN="Authorization: token ${GITHUB_ACCESS_TOKEN}"
function find_latest_branch_commit {
repo="$1"
branch="$2"
url="https://api.github.com/repos/my-org/${repo}/git/refs/heads/${branch}"
z=$(curl -s -H "${GIT_TOKEN}" ${url})
sha=$(grep '"sha": ' <<< "$z")
sha=${sha##*\"sha\": \"}
sha=${sha%\",}
echo "${sha}"
}
function get_commit {
repo="$1"
sha="$2"
url="https://api.github.com/repos/my-org/${repo}/git/commits/${sha}"
z=$(curl -s -H "${GIT_TOKEN}" ${url})
echo "${z}"
}
function get_latest_branch_commit {
repo="$1"
branch="$2"
get_commit "${repo}" "$(find_latest_branch_commit $repo $branch)"
}
function get_latest_branch_commit_date {
z="$(get_latest_branch_commit $@)"
z=$(grep "\"date\": \"" <<< "$z" | head -1)
z="${z#*\"date\": \"}"
z="${z%\"}" # string formatted like '2018-09-04T23:12:50Z'
z="${z//-/}"
z="${z//:/}" # string formatted like '20180904T231250Z'
echo "$z"
}
function get_latest_aws_tag_date {
z=$(find_aws_latest_tag "$1") # tag is formmated like '20180904T231250Z'
echo "$z"
}
function main {
if [[ "$(get_latest_branch_commit_date my-docker master)" > "$(get_latest_aws_tag_date py3dev)" ]]; then
echo
echo "'my-docker:master' has updated; re-building everything..."
echo
bash "${thisdir}/build.sh"
else
if [[ "$(get_latest_branch_commit_date infra develop)" > "$(get_latest_aws_tag_date infra-prod)" ]]; then
echo
echo "'infra:develop' has updated; re-building most images..."
echo
(cd "${thisdir}"/infra-prod && bash build.sh push)
repos=( app1 )
for repo in "${repos[@]}"; do
(cd "${thisdir}"/${repo}-dev && bash build.sh push)
(cd "${thisdir}"/${repo}-prod && bash build.sh push)
done
else
repos=( app1 )
for repo in "${repos[@]}"; do
if [[ "$(get_latest_branch_commit_date ${repo} develop)" > "$(get_latest_aws_tag_date ${repo}-prod)" ]]; then
echo
echo "'${repo}:develop' has updated; re-building '${repo}-prod'"
echo
(cd "${thisdir}/${repo}" && bash build.sh push)
fi
done
fi
fi
return 0
}
main