Controlling a Pipeline
Whether a pipeline is running locally, in the cloud, or on a teammate's machine, conducto.com/app is the place to view and control it. This article will show you how to use the Conducto web app to do so.
The example shown below will show several ways of manipulating a launched pipeline. It assumes that you have Conducto and Docker installed. If you're not sure about this, head over to Getting Started.
We'll start with a definition containing three nodes that do the same job in different ways. Then we'll launch a pipeline from that definition and modify its node parameters. It's set up like a race, so our modifications will affect who wins.
Our example pipelines are on GitHub. This page uses the one called "Compression Race".
Its commands have access to GNU Parallel, GZIP, and payload files containing between 50 MB and 250 MB of noise. If you're curious about how these things got there, feel free to skip ahead to Images where you'll learn about Dockerfiles.
Let's clone the example and look at the pipeline's structure:
$ git clone https://github.com/conducto/examples $ cd examples/compression_race $ python pipeline.py / ├─0 container parallelism │ ├─ worker 0 gzip -1 payload1.dat │ ├─ worker 1 gzip -1 payload2.dat │ ├─ worker 2 gzip -1 payload3.dat │ ├─ worker 3 gzip -1 payload4.dat │ └─ worker 4 gzip -1 payload5.dat ├─1 process parallelism ls payload*.dat | parallel gzip -1 └─2 no parallelism gzip -1 payload*.dat
This pipeline encapsulates three strategies for compressing five files:
- In five containers
- In a single container, with five processes in parallel
- In a single container, sequentially in a single process
Launch a pipeline for this definition by running
pipeline.py and passing
For more about local mode, see Local vs Cloud.
python pipeline.py --local
Conducto will look at the definition and launch a pipeline with a random name (mine was 'bei-hou').
Then it will direct you to a URL like
When you view a freshly launched pipeline in your browser, all nodes will be pending, and the 'Run' toggle will be unset.
Since this pipeline is set up like a race, changing how many cpu cores each node has access to will change which racer wins.
If you enable the 'Run' toggle, Conducto will run your commands to completion and display the results as they become available.
While 'Run' is enabled, Conducto will automatically run nodes that don't have a result. So if you reset a node you can expect Conducto to rerun it shortly afterwards.
Ideally you'll see several green checkmarks, which means that each node's command returned 0. Click on a node to the left to see how long it ran before completing.
Since we set the node parameter
cpu=2 on the root node, each node gets access to two cpu cores.
So if your machine has ten cores available, then "container parallelism" was probably the fastest racer--since each of the five containers had two processor cores to itself.
If fewer cores were available, some children of the
Parallel node might not have started until others completed and freed up a core.
In this case, maybe "process parallelism" won the race.
This wasn't really a fair race--one strategy used ten cpu cores (two per container), but the other had to do all the work with just two cores. Let's modify the underpowered node to give it ten cores also:
Having run the pipeline once, we already have half of the results that we need. We only need to rerun one node. But first, we need to correct the number of cpu cores it has access to.
Since the Run toggle is enabled, the node will automatically rerun as soon as you reset it. When it completes, you can compare the times to see how much faster it was.
It takes longer to start a container than it does to start a process, so it's not surprising that five parallel processes are faster than five parallel containers. As the job grows in complexity, however, the additional transparency and control that Conducto provides quickly becomes worth it.
For instance, you can expand the
container parallelism node, skip some of its children, and reset it.
Conducto keeps the data from previous runs around for comparison, so you can see how much the skipped children affected run time.
You can also modify a command, reset the node, and see which execution took longer or used more memory.
Clicking on the various entries in the timeline shows how the timestamped command differs from the current value.
When you're done interacting with this pipeline, you can put it to sleep.
What happens when you sleep a pipeline depends on whether it was launched locally or in the cloud. But in both cases it's the right thing to do if you don't need to see that pipeline anymore.
The "Pipelines" tab at the top will hide your sleeping pipelines by default, but you can modify the filter to see them. By default they stick around for seven days, during that window you can wake them up and continue tinkering.
In the sections above, we explored the ways that you can take an existing pipeline and modify how much computing power each node has access to. We used the Conducto web app to tweak and rerun certain nodes, and to compare results between runs.
There are many other node parameters besides
cpu that you can use to control pipelines, but working with them is similar.
They are documented in a later article, but if you followed along here, you're probably already prepared to play around with them too.
You can find a list of supported node parameters on the
The examples in this section used a custom image so that tools like
parallel were available.
Doing so is a good way to keep the commands in your Exec nodes simple.
In the Images section, you'll learn how to customize images for your own pipelines.
- Fun: Compression race: Perform the same workload multiple times, tweaking CPU resources to change how fast each one finishes.