Controlling a Pipeline

Overview

Whether a pipeline is running locally, in the cloud, or on a teammate's machine, conducto.com/app is the place to view and control it. This article will show you how to use the Conducto web app to do so.

If you want to know about defining pipelines, check out Pipeline Structure. Or see Local vs Cloud to learn about the options you have when launching one.

The example shown below will show several ways of manipulating a launched pipeline. It assumes that you have Conducto and Docker installed. If you're not sure about this, head over to Getting Started.

We'll start with a definition containing three nodes that do the same job in different ways. Then we'll launch a pipeline from that definition and modify its node parameters. It's set up like a race, so our modifications will affect who wins.

A Compression Race

Our example pipelines are on GitHub. This page uses the one called "Compression Race".

Its commands have access to GNU Parallel, GZIP, and payload files containing between 50 MB and 250 MB of noise. If you're curious about how these things got there, feel free to skip ahead to Images where you'll learn about Dockerfiles.

Let's clone the example and look at the pipeline's structure:

$ git clone https://github.com/conducto/examples
$ cd examples/compression_race
$ python pipeline.py

    /
    ├─0 container parallelism
    │ ├─ worker 0   gzip -1 payload1.dat
    │ ├─ worker 1   gzip -1 payload2.dat
    │ ├─ worker 2   gzip -1 payload3.dat
    │ ├─ worker 3   gzip -1 payload4.dat
    │ └─ worker 4   gzip -1 payload5.dat
    ├─1 process parallelism   ls payload*.dat | parallel gzip -1
    └─2 no parallelism   gzip -1 payload*.dat

This pipeline encapsulates three strategies for compressing five files:

  1. In five containers
  2. In a single container, with five processes in parallel
  3. In a single container, sequentially in a single process

Launch a Pipeline

Launch a pipeline for this definition by running pipeline.py and passing --local. For more about local mode, see Local vs Cloud.

python pipeline.py --local

Conducto will look at the definition and launch a pipeline with a random name (mine was 'bei-hou'). Then it will direct you to a URL like www.conducto.com/app/p/bei-hou.

When you view a freshly launched pipeline in your browser, all nodes will be pending, and the 'Run' toggle will be unset.

A fresh pipeline shown via the web app

Since this pipeline is set up like a race, changing how many cpu cores each node has access to will change which racer wins.

Run a Pipeline

If you enable the 'Run' toggle, Conducto will run your commands to completion and display the results as they become available.

A running pipeline shown via the web app

While 'Run' is enabled, Conducto will automatically run nodes that don't have a result. So if you reset a node you can expect Conducto to rerun it shortly afterwards.

Ideally you'll see several green checkmarks, which means that each node's command returned 0. Click on a node to the left to see how long it ran before completing.

Initial run

Since we set the node parameter cpu=2 on the root node, each node gets access to two cpu cores. So if your machine has ten cores available, then "container parallelism" was probably the fastest racer--since each of the five containers had two processor cores to itself. If fewer cores were available, some children of the Parallel node might not have started until others completed and freed up a core. In this case, maybe "process parallelism" won the race.

This wasn't really a fair race--one strategy used ten cpu cores (two per container), but the other had to do all the work with just two cores. Let's modify the underpowered node to give it ten cores also:

Changing Node Parameters

Having run the pipeline once, we already have half of the results that we need. We only need to rerun one node. But first, we need to correct the number of cpu cores it has access to.

Adding more cores to a node

Since the Run toggle is enabled, the node will automatically rerun as soon as you reset it. When it completes, you can compare the times to see how much faster it was.

Result Comparison

It takes longer to start a container than it does to start a process, so it's not surprising that five parallel processes are faster than five parallel containers. As the job grows in complexity, however, the additional transparency and control that Conducto provides quickly becomes worth it.

Control a Single Node

For instance, you can expand the container parallelism node, skip some of its children, and reset it. Conducto keeps the data from previous runs around for comparison, so you can see how much the skipped children affected run time.

Skipped Nodes

You can also modify a command, reset the node, and see which execution took longer or used more memory.

Modify Command

Clicking on the various entries in the timeline shows how the timestamped command differs from the current value.

Put a Pipeline to Sleep

When you're done interacting with this pipeline, you can put it to sleep.

Top Bar with Sleep Button

What happens when you sleep a pipeline depends on whether it was launched locally or in the cloud. But in both cases it's the right thing to do if you don't need to see that pipeline anymore.

The "Pipelines" tab at the top will hide your sleeping pipelines by default, but you can modify the filter to see them. By default they stick around for seven days, during that window you can wake them up and continue tinkering.

Conclusion

In the sections above, we explored the ways that you can take an existing pipeline and modify how much computing power each node has access to. We used the Conducto web app to tweak and rerun certain nodes, and to compare results between runs.

There are many other node parameters besides cpu that you can use to control pipelines, but working with them is similar. They are documented in a later article, but if you followed along here, you're probably already prepared to play around with them too. You can find a list of supported node parameters on the Node class.

The examples in this section used a custom image so that tools like gzip and parallel were available. Doing so is a good way to keep the commands in your Exec nodes simple. In the Images section, you'll learn how to customize images for your own pipelines.

Concepts

Example Pipelines

API's

Chat with us for a live demo right now!
(If we're awake 😴)

avatar