Docker¶
Conducto uses Docker containers to provide portability and scalability. A Docker image is a template that packages your code with fully-defined OS and dependencies, while a container is a running instance of one.
Docker can be intimidating for newcomers and awkward for professionals, so Conducto has a number of features that simplify using it for pipelines.
An image is a template that contains your code and dependencies, and a container is a running instance of one. Images are defined by a Dockerfile, which builds up this execution environment step-by-step. Detailed tutorials can be found elsewhere, but commonly a Dockerfile will have a few components:
A base image to build on
FROM python:3.7
Actions to change or enhance the environment
RUN pip install pandas
Commands to put your own code into the container
COPY path/on/my/computer path/in/image
Image Definition¶
You can specify images for each node (or defaults for the entire pipeline or
subtree) using the image parameter of conducto.Exec()
. Here are some
examples of useful images.
Extending a base image with packages and user-code
# Make a Docker Image based on python:3.7, using all the files in '.' as
# the build context, and `pip install` conducto and pandas.
import conducto as co
co.Image("python:3.7", copy_dir=".", reqs_py=["conducto", "pandas"])
Auto-building a Dockerfile
# Run `docker build` on '../../Dockerfile`.
import conducto as co
co.Image(dockerfile="../../Dockerfile")
Specify the text of the Dockerfile programmatically.
# Build:
# FROM python:3.7
# COPY . /my/path
# using this file's directory as the context.
import conducto as co
co.Image(dockerfile_text="FROM python:3.7\nCOPY . /my/path", context=".")
Use a git repo as your build context. Very useful for CI/CD.
# Use the 'main' branch of Conducto's public package as your build context.
import conducto as co
co.Image("python:3.7", copy_branch="main",
copy_url="https://github.com/conducto/conducto.git")
-
class
conducto.
Image
(image=None, *, dockerfile=None, dockerfile_text=None, docker_build_args=None, context=None, copy_repo=None, copy_dir=None, copy_url=None, copy_branch=None, docker_auto_workdir=True, install_pip=None, install_npm=None, install_packages=None, install_docker=False, path_map=None, shell='__auto__', name=None, git_urls=None, instantiation_directory=None, reqs_py=None, reqs_npm=None, reqs_packages=None, reqs_docker=False, **kwargs)¶ - Parameters
image (str) – Specify the base image to start from. Code can be added with various context* variables, and packages with install_* variables.
dockerfile (str) – Use instead of
image
and pass a path to a Dockerfile. Relative paths are evaluated starting from the file where this code is written. Unlesscontext
is specified, it uses the directory of the Dockerfile as the build contextdockerfile_text (str) – Directly pass the text of a Dockerfile rather than linking to one that’s already written. If you want to use
ADD
orCOPY
you must specifycontext
explicitly.docker_build_args (dict) – Dict mapping names of arguments to
docker --build-args
to valuesdocker_auto_workdir (bool) – Set the work-dir to the destination of
copy_dir
. Default:True
context (str) – Use this to specify a custom docker build context when using
dockerfile
.copy_repo (bool) – Set to True to automatically copy the entire current Git repo into the Docker image. Use this so that a single Image definition can either use local code or can fetch from a remote repo. copy_dir mode: Normal use of this parameter uses local code, so it sets copy_dir to point to the root of the Git repo of the calling code. copy_url mode: Specify copy_branch to use a remote repository. This is commonly done for CI/CD. When specified, copy_url will be auto-populated.
copy_dir (str) – Path to a directory. All files in that directory (and its subdirectories) will be copied into the generated Docker image.
copy_url (str) – URL to a Git repo. Conducto will clone it and copy its contents into the generated Docker image. Authenticate to private GitHub repos with a URL like https://{user}:{token}@github.com/…. See secrets for more info on how to store this securely. Must also specify copy_branch.
copy_branch (str) – A specific branch name to clone. Required if using copy_url.
path_map (None) – Dict that maps external_path to internal_path. Needed for live debug and for passing callables to
Exec
&Lazy
. It can be inferred fromcopy_dir
,copy_url
, orcopy_repo
; if not using one of those, you must specifypath_map
explicitly. This typically happens when a user-generated Dockerfile copies the code into the image.install_pip (List[str]) – List of Python packages for Conducto to
pip install
into the generated Docker image.install_npm (List[str]) – List of npm packages for Conducto to
npm i
into the generated Docker image.install_packages (List[str]) – List of packages to install with the appropriate Linux package manager for this image’s flavor.
install_docker (bool) – If
True
, install Docker during build time.shell (str) – Which shell to use in this container. Defaults to
co.Image.AUTO
to auto-detect.AUTO
will prefer/bin/bash
when available, and fall back to/bin/sh
otherwise.name (str) – Name this Image so other Nodes can reference it by name. If no name is given, one will automatically be generated from a list of our favorite Pokemon. I choose you, angry-bulbasaur!
instantiation_directory (str) – The directory of the file in which this image object was created. This is used to determine where relative paths passed into co.Image are relative from. This is automatically populated internally by conducto.
reqs_py – Deprecated. Use
install_py
instead.reqs_npm – Deprecated. Use
install_npm
instead.reqs_packages – Deprecated. Use
install_packages
instead.reqs_docker – Deprecated. Use
install_docker
instead.
Named Images¶
Sometimes it is useful to specify the image_name the construction of a
conducto.Node
rather than the image object itself. The following
code snippets are equivalent, but when using conducto.lazy_py()
, it
may be useful to reference by name.
import conducto as co
root = co.Parallel()
root.register_image(co.Image("python:3.8", copy_dir=".", name="base_python"))
root.register_image(co.Image("ruby:2.7", copy_dir=".", name="base_ruby"))
root["RunPython"] = co.Exec("python -c 'print(\"I am running in python\")'", image_name="base_python")
root["RunRuby"] = co.Exec("ruby -e 'puts \"I am doing some ruby\"'", image_name="base_ruby")
import conducto as co
root = co.Parallel()
python_image = co.Image("python:3.8", copy_dir=".")
ruby_image = co.Image("ruby:2.7", copy_dir=".")
root["RunPython"] = co.Exec("python -c 'print(\"I am running in python\")'", image=python_image)
root["RunRuby"] = co.Exec("ruby -e 'puts \"I am doing some ruby\"'", image=ruby_image)
-
conducto.Node.
register_image
(self, image: conducto.image.Image)¶ Register a named Image for use by descendant Nodes that specify image_name. This is especially useful with lazy pipeline creation to ensure that the correct base image is used.
Image Path Translation¶
The parameters copy_dir, copy_url and copy_branch take care of many of the simple cases for image path translation. If the path cannot be inferred you can declare mappings via the path_map parameter. There are two cases where path_map is helpful:
with copy_url and copy_branch, specify the local path of the checked-out source.
if your dockerfile for the image contains a COPY line, you may wish to specify the external and internal paths to enable binding for live debug.
-
conducto.
relpath
(path)¶ Construct a path with decoration to enable translation inside a docker image for a node. This may be used to construct path parameters to a command line tool.
This is used internally by
conducto.Exec
when used with a Python callable to construct the command line which executes that callable in the pipeline.
Running Exec Nodes¶
Each Exec node runs in a container, but multiple Exec nodes may share a single container. Conducto provides a few modes for controlling this behavior.
Default: each Exec node usually gets its own container¶
Normally, Conducto runs each Exec node in its own container. For efficiency reasons it may reuse a container - if one Exec node finishes and another in the queue is compatible with the now-available container, Conducto will assign one from the queue to the container.
If you expect each Exec node to run independently and not destructively modify the state of its container, this is a great default choice.
Run Exec nodes in a single container¶
Cases do exist where you want to build up local state over the course of a few nodes. This example starts by installing the python redis package into the container, then uses the newly installed package to read and write data to a redis-server. These steps must all run in the same container, or else the read & write steps would not be able to see the redis package.
import conducto as co
with co.Serial(container_reuse_context=co.ContainerReuseContext.NEW) as test:
test["Install"] = co.Exec("pip install redis")
test["Write"] = co.Exec("...")
test["Read"] = co.Exec("...")
To instruct Conducto that these nodes must share a container, create a new “same container” context: container_reuse_context=co.ContainerReuseContext.NEW. All child nodes below this that have the default of container_reuse_context=None will share this container.
Another use of ContainerReuseContext.NEW is to start a server in one Exec node, and then run a test against it in the next Exec node. Alternatively, you could put these commands in a single Exec node, connected with &&. But, separating them into multiple Exec nodes improves clarity by giving you separate outputs for each command, making debugging easier.
Note: you can also use this feature if you simply want to disable container reuse and ensure that each Exec node gets its own container.