Drone.io build time optimisation
Setting the stage: organisational circumstances
Early stage: We started as a 2 person development team working on 2 containerised services. While we were concerned about scale, we weren't concerned about scaling the team itself, so we built our services using simple makefiles, not considering the future cost.
Growth: After few months we found ourselves working with ~20 developers from 3 different teams, across several countries, all on the same project. We have 25 containerised services (and still growing), that are communicating with each other.
Taming the beast: After giving up using makefiles and moving to drone.io CD, typical build took ~6 mins. In a normal day, every developer is initiating 5-10 builds (git push / tag events). So we decided to tackle that pain.
Starting point: cross section
Multiple services, based on a large ubuntu python image
Multiple dockerfiles (one for each service) that have many commonalities
Many libraries that are not in use pre-installed on our base image
Compile-time deps that are installed on the containers, that are not needed to run-time on prod
CD system that does not support docker cache between steps
CD system located in a different region than our docker registry
Image that was tested, is different than the image pushed to our registry
We started with dropping our python base image, changing to alpine, installing our own dependencies
RUN apk add --update \
This image will be the foundation for all of our python based images. We decided we don't want to use python-alpine image, because we wanted to control whatever is installed on the image.
Cache the built image
Before the change - the build step was running few commands, in the context of the build agent:
.drone.yml - build step (before change)
- virtualenv /app/env
- apt-get update && apt-get install -y --no-install-recommends nginx libnss3 libnss3-dev
- apt-get update -y - apt-get install -y python-dev python-pip python- setuptools
- /app/env/bin/pip install -r ./requirements.txt
- /app/env/bin/pip install supervisor supervisor-stdout uwsgi
- /app/env/bin/pip install -e .
- cp ./ops/etc/nginx.conf /etc/nginx/
- cp ./ops/etc/supertxvisord/api.conf /etc/supervisord.conf
- cp ./ops/etc/uwsgi.ini /etc/uwsgi.ini
So we were running few commands, that were not built as docker layers. This caused each and every build (for instance) to reinstall all the requirements.txt of the project.
So we moved to our base image, and installed docker on it just for build purposes, and shared the docker.sock file, so the different layers will be cached between builds (every build step, drone launch a new agent - container that runs the build step, but it won't mount the docker.sock file by default):
.drone.yml - build step (after change)
- apk --update add docker
- docker build -t cloudlock/svc-xxx:build .
The Dockerfile already had all the instructions that previously were in that step, so building the image using the Dockerfile is doing the work
Making drone.io ECR plugin use docker cache
Not having shared docker.sock file caused push to our docker registry loosing the cache in every single build.
so we added the volume:
.drone.yml - push step
This might sound the obvious, but only after dealing with it, we figured out that we don't have cache between two builds
Moving our CD instances to the same region where our docker registry sits saved us ~5 seconds per build. This move took 2 hrs of work of one of our dev-ops team. Some will say this wasn't wort the time, but as devs time wasted builds up, it does.
To sum things up
From time to time we as developers are needed to tackle technical problems, that are not related to the product directly. For example (the most trivial case) ramping up a new developer to the team. Not addressing trivial problems such as this one, can be expensive. Having a new developer ramp-up procedure before he joins the team, will make her life and the teams life much more productive and easy.
Cutting 5 minutes from our build time, might be nothing when talking about 2 developers, doing it 3 times a day, but as team size scaled up, it became expensive. Addressing this issue, that is considered not that complicated, saved us expensive time, and shortened our deployment iterations.
Hope you enjoy the read!