Skip to content

Docker Containerization

Virtualization

For our analyses to be open, they need to run on any system while always giving the same results. This can be achieved using a technology called virtualization. Virtualization means that a piece of software, e.g. an operating system, is run as a virtual computer inside another operating system. This way, you can for example run Linux on a computer that normally runs on Windows. This would usually be done using a software like VirtualBox that sets up a virtual machine. The virtual machine then acts like a real computer and it can even interact with the operating system it is embedded in.

Containerization

For our purposes we make use of containerization, which is a special case of virtualization. Containers are in essence small virtual machines. Though there are some technical differences, the important difference for us is a philosophical one: instead of setting up an operating system that contains software needed for all kinds of purposes, a container only contains the software necessary for a specific task. With the help of Docker, a self-contained system can be set up that has all the software that we need to run our analyses. This also enables us to make use of functions from several analysis software packages using Nipype, independent of our operating system. Nipype works as an interface between these packages and is based on the open-source programming language Python.

What is docker?

Docker is a containterization software. As mentioned above, containerization works similar to a virtual environment, in the sense that you simulate an operating system that is different from the one you are using on your computer. It runs in a container. A container is a self-contained system that has everything that is needed to run software: An operating system, a file system and code. Docker enables us to construct such a container and run our analysis in it.

Singularity is another containerization software that is often used to run containers in a shared IT infrastructure or a cloud. This might come in handy for running analyses that require much computing power. While in theory both softwares can be run in the cloud, often Singularity is preferred for this purpose due to security issues.

With Docker, the user has full access to the host system, which can lead to unwanted outcomes such as accidentally deleting or overwriting files. Here, we concentrate on Docker. Singularity is however quite similar to work with, so that it shouldn’t take too much effort getting to know it after learning Docker.

Why docker?

Why should we run our analysis in a Docker container? Using containerization software has several advantages.

  • We can share the complete computing environment (all software and dependencies)

  • Collaboration works across operating systems (between researchers using Windows, Mac OS or Linux)

With Docker, we can set up a data processing environment with the needed software for data processing in something called a Docker image. The image contains all the software that we need for our analysis. The container is the instantiation of our Docker image. Docker images can then be shared on the internet on a platform called Docker Hub. Other researchers can download these, for instance to use it on their data. Downloading an image is called “pulling”.

If you want to know how to install Docker on your system, please follow this tutorial. It is part of a very good tutorial on Nipype.

Neurodocker

In order to make a Docker image, we need to make a Dockerfile, which is basically a recipe for how to build our container. Fortunately, with Neurodocker there is a tool that makes this task very simple. It runs directly in Docker as a Docker image. See the part on Neurodocker in the Nipype tutorial on how to use it.