Exec nodes run a shell command in a container.
Conducto reuses these containers for two reasons: For speed, to save on startup time, or for state, to allow multiple nodes to build up their container's state incrementally.
Follow these tips to maximize parallelism and minimize latency:
- Divide your work into many small tasks.
- Make an
Execnode for each task, and group them under a
Parallelnode so they can run concurrently.
- Use the same resources for each node.
Containers can take considerable time to start up if they are large or if hardware needs to be provisioned. Conducto takes care to minimize this overhead while parallelizing for maximum throughput.
Exec node has some resources specified (
docker_run_args), and Conducto ensures they run in a container that matches.
Exec node moves from Pending to Queued, Conducto:
- Finds the work queue with matching resources, creating one if needed.
- Places this
Execnode at the back of the queue.
- Launches worker processes in containers with the specified resources.
The worker processes run
Exec nodes from their queue until it is empty, then they exit.
If you split your work into many
Exec nodes, Conducto can optimize the execution.
For short tasks, a few worker processes will run all of them quickly before many containers are launched.
If tasks take a long time, the worker pool will grow to run them all concurrently.
CI/CD frequently require steps that 1) install some software, 2) start a server, and 3) run some tests.
You can combine these steps into one shell script in a single
If break them into separate nodes, you can see each step's individual output, timing, and success.
The single-node version might look like this:
co.Exec(""" npm i ./start-server.sh python test.py """)
To break it into separate nodes, set
container_reuse_context to sidestep the normal work queues and force Conducto to run all children in a single container:
with co.Serial(container_reuse_context=co.ContainerReuseContext.NEW): co.Exec("npm i", name="install") co.Exec("./start-server.sh", name="start") co.Exec("python test.py", name="test")
By circumventing the work queues, you lose the ability to parallelize.
Because the state is built up over the course of the commands, the nodes have to be run as a unit, and cannot be skipped or reset individually.
Because of these drawbacks,
ContainerReuseContext.NEW is not great for passing data between nodes.
You can do it, but you lose a lot of the benefits of Conducto.
To learn how to pass data between nodes without relying on a shared filesystem, check out conducto.Data instead.
Also, now that you understand how containers are created in context with a pipeline, you're ready to start asking Conducto to create standalone containers for a given node, which is how you debug a Conducto pipeline.
- Features: Container reuse: Conducto can go faster by reusing containers. Learn how to control this.