Conducto uses Docker containers to provide portability and scalability. A Docker image is a template that packages your code with fully-defined OS and dependencies, while a container is a running instance of one.

Docker can be intimidating for newcomers and awkward for professionals, so Conducto has a number of features that simplify using it for pipelines.

An image is a template that contains your code and dependencies, and a container is a running instance of one. Images are defined by a Dockerfile, which builds up this execution environment step-by-step. Detailed tutorials can be found elsewhere, but commonly a Dockerfile will have a few components:

  • A base image to build on

FROM python:3.7
  • Actions to change or enhance the environment

RUN pip install pandas
  • Commands to put your own code into the container

COPY path/on/my/computer path/in/image

Image Definition

You can specify images for each node (or defaults for the entire pipeline or subtree) using the image parameter of conducto.Exec(). Here are some examples of useful images.

Extending a base image with packages and user-code

# Make a Docker Image based on python:3.7, using all the files in '.' as
# the build context, and `pip install` conducto and pandas.
import conducto as co
co.Image("python:3.7", copy_dir=".", reqs_py=["conducto", "pandas"])

Auto-building a Dockerfile

# Run `docker build` on '../../Dockerfile`.
import conducto as co

Specify the text of the Dockerfile programmatically.

# Build:
#     FROM python:3.7
#     COPY . /my/path
# using this file's directory as the context.
import conducto as co
co.Image(dockerfile_text="FROM python:3.7\nCOPY . /my/path", context=".")

Use a git repo as your build context. Very useful for CI/CD.

# Use the 'main' branch of Conducto's public package as your build context.
import conducto as co
co.Image("python:3.7", copy_branch="main",
class conducto.Image(image=None, *, dockerfile=None, dockerfile_text=None, docker_build_args=None, context=None, copy_repo=None, copy_dir=None, copy_url=None, copy_branch=None, docker_auto_workdir=True, install_pip=None, install_npm=None, install_packages=None, install_docker=False, path_map=None, shell='__auto__', name=None, git_urls=None, instantiation_directory=None, reqs_py=None, reqs_npm=None, reqs_packages=None, reqs_docker=False, **kwargs)
  • image (str) – Specify the base image to start from. Code can be added with various context* variables, and packages with install_* variables.

  • dockerfile (str) – Use instead of image and pass a path to a Dockerfile. Relative paths are evaluated starting from the file where this code is written. Unless context is specified, it uses the directory of the Dockerfile as the build context

  • dockerfile_text (str) – Directly pass the text of a Dockerfile rather than linking to one that’s already written. If you want to use ADD or COPY you must specify context explicitly.

  • docker_build_args (dict) – Dict mapping names of arguments to docker --build-args to values

  • docker_auto_workdir (bool) – Set the work-dir to the destination of copy_dir. Default: True

  • context (str) – Use this to specify a custom docker build context when using dockerfile.

  • copy_repo (bool) – Set to True to automatically copy the entire current Git repo into the Docker image. Use this so that a single Image definition can either use local code or can fetch from a remote repo. copy_dir mode: Normal use of this parameter uses local code, so it sets copy_dir to point to the root of the Git repo of the calling code. copy_url mode: Specify copy_branch to use a remote repository. This is commonly done for CI/CD. When specified, copy_url will be auto-populated.

  • copy_dir (str) – Path to a directory. All files in that directory (and its subdirectories) will be copied into the generated Docker image.

  • copy_url (str) – URL to a Git repo. Conducto will clone it and copy its contents into the generated Docker image. Authenticate to private GitHub repos with a URL like https://{user}:{token}…. See secrets for more info on how to store this securely. Must also specify copy_branch.

  • copy_branch (str) – A specific branch name to clone. Required if using copy_url.

  • path_map (None) – Dict that maps external_path to internal_path. Needed for live debug and for passing callables to Exec & Lazy. It can be inferred from copy_dir, copy_url, or copy_repo; if not using one of those, you must specify path_map explicitly. This typically happens when a user-generated Dockerfile copies the code into the image.

  • install_pip (List[str]) – List of Python packages for Conducto to pip install into the generated Docker image.

  • install_npm (List[str]) – List of npm packages for Conducto to npm i into the generated Docker image.

  • install_packages (List[str]) – List of packages to install with the appropriate Linux package manager for this image’s flavor.

  • install_docker (bool) – If True, install Docker during build time.

  • shell (str) – Which shell to use in this container. Defaults to co.Image.AUTO to auto-detect. AUTO will prefer /bin/bash when available, and fall back to /bin/sh otherwise.

  • name (str) – Name this Image so other Nodes can reference it by name. If no name is given, one will automatically be generated from a list of our favorite Pokemon. I choose you, angry-bulbasaur!

  • instantiation_directory (str) – The directory of the file in which this image object was created. This is used to determine where relative paths passed into co.Image are relative from. This is automatically populated internally by conducto.

  • reqs_py – Deprecated. Use install_py instead.

  • reqs_npm – Deprecated. Use install_npm instead.

  • reqs_packages – Deprecated. Use install_packages instead.

  • reqs_docker – Deprecated. Use install_docker instead.

Named Images

Sometimes it is useful to specify the image_name the construction of a conducto.Node rather than the image object itself. The following code snippets are equivalent, but when using conducto.lazy_py(), it may be useful to reference by name.

import conducto as co
root = co.Parallel()
root.register_image(co.Image("python:3.8", copy_dir=".", name="base_python"))
root.register_image(co.Image("ruby:2.7", copy_dir=".", name="base_ruby"))
root["RunPython"] = co.Exec("python -c 'print(\"I am running in python\")'", image_name="base_python")
root["RunRuby"] = co.Exec("ruby -e 'puts \"I am doing some ruby\"'", image_name="base_ruby")
import conducto as co
root = co.Parallel()
python_image = co.Image("python:3.8", copy_dir=".")
ruby_image = co.Image("ruby:2.7", copy_dir=".")
root["RunPython"] = co.Exec("python -c 'print(\"I am running in python\")'", image=python_image)
root["RunRuby"] = co.Exec("ruby -e 'puts \"I am doing some ruby\"'", image=ruby_image)
conducto.Node.register_image(self, image: conducto.image.Image)

Register a named Image for use by descendant Nodes that specify image_name. This is especially useful with lazy pipeline creation to ensure that the correct base image is used.

Image Path Translation

The parameters copy_dir, copy_url and copy_branch take care of many of the simple cases for image path translation. If the path cannot be inferred you can declare mappings via the path_map parameter. There are two cases where path_map is helpful:

  • with copy_url and copy_branch, specify the local path of the checked-out source.

  • if your dockerfile for the image contains a COPY line, you may wish to specify the external and internal paths to enable binding for live debug.


Construct a path with decoration to enable translation inside a docker image for a node. This may be used to construct path parameters to a command line tool.

This is used internally by conducto.Exec when used with a Python callable to construct the command line which executes that callable in the pipeline.

Running Exec Nodes

Each Exec node runs in a container, but multiple Exec nodes may share a single container. Conducto provides a few modes for controlling this behavior.

Default: each Exec node usually gets its own container

Normally, Conducto runs each Exec node in its own container. For efficiency reasons it may reuse a container - if one Exec node finishes and another in the queue is compatible with the now-available container, Conducto will assign one from the queue to the container.

If you expect each Exec node to run independently and not destructively modify the state of its container, this is a great default choice.

Run Exec nodes in a single container

Cases do exist where you want to build up local state over the course of a few nodes. This example starts by installing the python redis package into the container, then uses the newly installed package to read and write data to a redis-server. These steps must all run in the same container, or else the read & write steps would not be able to see the redis package.

import conducto as co
with co.Serial(container_reuse_context=co.ContainerReuseContext.NEW) as test:
    test["Install"] = co.Exec("pip install redis")
    test["Write"] = co.Exec("...")
    test["Read"] = co.Exec("...")

To instruct Conducto that these nodes must share a container, create a new “same container” context: container_reuse_context=co.ContainerReuseContext.NEW. All child nodes below this that have the default of container_reuse_context=None will share this container.

Another use of ContainerReuseContext.NEW is to start a server in one Exec node, and then run a test against it in the next Exec node. Alternatively, you could put these commands in a single Exec node, connected with &&. But, separating them into multiple Exec nodes improves clarity by giving you separate outputs for each command, making debugging easier.

Note: you can also use this feature if you simply want to disable container reuse and ensure that each Exec node gets its own container.