Exec nodes run a shell command in a container.

Conducto reuses these containers for two reasons: For speed, to save on startup time, or for state, to allow multiple nodes to build up their container's state incrementally.

Reuse for speed

Best practices

Follow these tips to maximize parallelism and minimize latency:

  • Divide your work into many small tasks.
  • Make an Exec node for each task, and group them under a Parallel node so they can run concurrently.
  • Use the same resources for each node.

How it works

Containers can take considerable time to start up if they are large or if hardware needs to be provisioned. Conducto takes care to minimize this overhead while parallelizing for maximum throughput.

Each Exec node has some resources specified (cpu, mem, image, requires_docker, and docker_run_args), and Conducto ensures they run in a container that matches. When an Exec node moves from Pending to Queued, Conducto:

  • Finds the work queue with matching resources, creating one if needed.
  • Places this Exec node at the back of the queue.
  • Launches worker processes in containers with the specified resources.

The worker processes run Exec nodes from their queue until it is empty, then they exit.

If you split your work into many Exec nodes, Conducto can optimize the execution. For short tasks, a few worker processes will run all of them quickly before many containers are launched. If tasks take a long time, the worker pool will grow to run them all concurrently.

Reuse for state

Motivation

CI/CD frequently require steps that 1) install some software, 2) start a server, and 3) run some tests. You can combine these steps into one shell script in a single Exec node. If break them into separate nodes, you can see each step's individual output, timing, and success.

Usage

The single-node version might look like this:

co.Exec("""
    npm i
    ./start-server.sh
    python test.py 
""")

To break it into separate nodes, set container_reuse_context to sidestep the normal work queues and force Conducto to run all children in a single container:

with co.Serial(container_reuse_context=co.ContainerReuseContext.NEW):    co.Exec("npm i", name="install")
    co.Exec("./start-server.sh", name="start")
    co.Exec("python test.py", name="test")

Drawbacks

By circumventing the work queues, you lose the ability to parallelize.

Because the state is built up over the course of the commands, the nodes have to be run as a unit, and cannot be skipped or reset individually.

Reuse for data??

Because of these drawbacks, ContainerReuseContext.NEW is not great for passing data between nodes. You can do it, but you lose a lot of the benefits of Conducto. To learn how to pass data between nodes without relying on a shared filesystem, check out conducto.Data instead.

Also, now that you understand how containers are created in context with a pipeline, you're ready to start asking Conducto to create standalone containers for a given node, which is how you debug a Conducto pipeline.

Concepts

APIs

Example Pipelines