Before it runs a pipeline, Conducto builds an image for every Exec node. This is necessary because nodes run in containers.

Assign an Image to a Node

Assign images to nodes with the image node parameter. It accepts an Image object.

import conducto as co

py_img = co.Image("python:3.8", install_pip=["pandas"])
co.Exec("python somescript.py", image=py_img)  # can import pandas

Like with other node parameters, parent nodes pass values along to their children.

parent = co.Serial(image=py_img)
parent["child"] = co.Exec("python somescript.py")  # same as above

If you don't want to apply any customizations to the image, you can also pass a string.

co.Exec("curl https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_week.csv --output /tmp/earthquakes.csv",
        image="byrnedo/alpine-curl")

Customize an Image

Conducto selects from these stages when it makes an image:

The image parameters in the diagram above are links to their related sections.

Combining different Image parameters will make Conducto build different images. The rest of this article is about to combine them to get what you want.

Use The Default Image

If you don't supply an image, Conducto will use the default one, which is based on Debian.

co.Exec("egrep '^NAME' /etc/os-release")  # Debian GNU/Linux

Image objects created without a positional argument will also use the default image.

img = co.Image(install_packages=["wget"]) # uses Debian too

Use An Unmodified Image

If you want to use an image without additional modification, simply name it.

co.Exec("echo hello ubuntu", image="ubuntu:groovy")

Often you can find an image for a specific tool, rather than starting with something generic and installing it after the fact.

cmd = "mongo $DB_HOST --eval 'printjson(db.serverStatus())'"
co.Exec(cmd, image="mongo:3.6")

DockerHub has many ready-made environments to choose from. You can also publish your own or use a private container repository.

Copy Files into the Image

The copy_* image parameters tell Conducto to copy files into the image. They also set the node working directory to the files' parent, which makes them easy to reference.

Copy from the Local Filesystem

copy_dir will copy from the local filesystem. It takes an absolute path, or a path relative to the pipeline definition file.

local = co.Image(copy_dir=".")
co.Exec("ls", image=local)

The node defined above will list the contents of the pipeline definition's parent.

Copy from a Remote Git Repo

copy_url and copy_branch tell Conducto to first clone a remote git repo, and later copy it into the image.

remote = co.Image(copy_url="https://github.com/conducto/examples",
                  copy_branch="main")

Since pipeline definitions are just code, its possible to have explicit control over which images a node uses.

if(is_local):
    root = co.Serial(image=local)
else:
    root = co.Serial(image=remote)
root["child"] = co.Exec("pytest")

In most cases, that isn't necessary. Instead you can let Conducto handle it with copy_repo--a local/remote hybrid.

Copy Based on Context

If you're working with uncommitted changes you'll want Conducto to put local files in its images. On the other hand, sometimes remote files are the only option. copy_repo=True tells Conducto to use local files if possible, and remote files if necessary.

img = co.Image("python:3.8-alpine", copy_repo=True)

If somebody launched the pipeline manually, copy_repo=True tells Conducto to copy the whole enclosing git repo into the image. This is the case even if the pipeline will ultimately run in cloud mode (since there's still a local filesystem involved at launch time).

copy_branch is optional when used with copy_repo. Without it, Conducto will guess the both values based on the calling integration. With it, Conducto will guess only the url.

Providing copy_repo=true (without copy_branch) covers what we expect is the typical case: Which is where the pipeline operates on code in its own repo.

Reference Copied Files

While using the copy_* image parameters, you can often get away with ignoring the absolute paths involved. Suppose you launch a pipeline from a directory that looks like this:

.
├── hello.txt
└── pipeline.py

And pipeline.py contains a node defined like this:

img = co.Image(copy_dir=".")
co.Exec("cat hello.txt", image=img)

Your node will reference the right file without knowing its absolute path. This is because Conducto created a directory, copied your file into it, and made it the node's working directory.

Install Software into the Image

If your image includes a supported package manager, you can tell Conducto to install software for you.

Install Python Packages

To have Conducto preinstall Python packages, use the install_pip image parameter.

import conducto as co
img=co.Image("python:3.8", install_pip=["sympy"])

If your nodes call native functions in the same module as your pipeline definition, consider moving the imports inside those functions. That way you don't need to bother with them if you just want to launch the pipeline--they're node dependencies, not launch dependencies.

#from sympy import symbols, solvedef print_sol(y_str, z_str):
    from sympy import symbols, solve    x, y, z = symbols(['x', y_str, z_str])
    print(solve( x**2 + y * x + z, x ))

img = co.Image(install_pip=["conducto","sympy"]))
root = co.Serial(image=img)
root["solution"] = co.Exec(print_sol, "hello", "world")

Install Linux Packages

If you pick an image with a supported OS package manager (apt, apk, yum, or dnf), Conducto can install OS packages for you. install_packages is the image parameter for this.

The snippet below installs jq and uses it to parse some JSON.

co.Exec("""echo '{"message": "Hello World"}' | jq '.message'""",
    image=co.Image(install_packages=["jq"])
)

Install the Docker Client

There's also install_docker, which will install docker on your image. When you reference it, be sure to also use the requires_docker node parameter.

co.Exec("docker run hello-world",
    image=co.Image(install_docker=True) # installs docker client
    requires_docker=True                # connects client to daemon
)

this will mount a docker socket to that node's filesystem. In local mode it will be your docker socket, so be sure you understand the security implications before launching pipelines from definitions that do this.

Go Custom with a Dockerfile

The traditional way to create docker images is by writing a Dockerfile. If you already have one lying around, or you want to go beyond what the Image class supports, you can have Conducto use it instead.

Here's one:

FROM debian:bullseye-20200720-slim
RUN apt-get update && apt-get install -y git cowsay
RUN git clone http://github.com/possatti/pokemonsay
RUN cd pokemonsay && ./install.sh
ENV PATH "/usr/games:/root/bin:${PATH}"

Conducto Node filesystems will have whatever modifications happened in the Dockerfile.

img = co.Image(dockerfile="Dockerfile")
co.Exec("pokemonsay -p Oddish -n 'Hi'")

This lets you configure your image however you like.

A powerful third-party utility

Control the Build Context

If your Dockerfile copies files into the image, it might use a relative path.

COPY . /usr/local/src/myapp

That . is a reference to your Docker build context. If you used copy_dir, it's the directory that contains your Dockerfile. If copy_repo=True is present, your build context will be the repo root.

You can set the Docker build context explicitly with the optional context image parameter.

img = co.Image(dockerfile="./dockerfiles/testenv", context="../testdata")

Mount Local Files while Debugging

Most of the Image parameters in this article take effect when Conducto builds an image. path_map is an exception.

When you start a live debug session, Conducto mounts files from your development environment so that they hide files in the image. To do that it needs to answer:

  • What parts of the local development environment should I mount?
  • Where in the image should I mount them?

path_map is an answer to both questions. It takes a dictionary where the keys are paths to local files, and the values are those files' mount points in the image filesystem.

img = co.Image("myImage", path_map={".", "/usr/local/src/myapp"}

The usage of path_map shown above says:

  • Take the directory containing the pipeline definition, and
  • put it at an absolute path in the image.

Presumably those files made it into the image some other way. Since Conducto didn't copy them, you'll have to provide a path_map--otherwise live debug sessions won't be available.

Use an Implicit Path Map

If you use copy_dir, you don't need to provide path_map. Since Conducto copied the files, it remembers where they came from.

Images populated with copy_url and copy_repo won't have that info. You'll need to provide a path map if you want to debug live code in those pipelines. In this context, local paths are relative to the pipeline directory and image paths are relative to the root of the repo.