The sections below are about the conceptual building blocks needed to work with Conducto pipelines. Each section links to a more in-depth article about that concept. If you haven't launched a Conducto pipeline yet, you might want to check out Getting Started before digging into the articles here.
A Conducto pipeline is a tree which controls how commands are executed.
Exec type nodes are at the leaves of the tree--they run shell commands in containers.
Serial nodes control the way that their children run.
Conducto provides an API so you can arrange these nodes in code to make a pipeline definition.
You can run any pipeline definition and then control it using conducto.com/app. From there you can run the pipeline, examine results, make changes, and rerun all or part of the pipeline. The history of each node execution is saved with the pipeline, so you can see the impact of any changes you made.
When you launch a pipeline, you get to launch it in local or cloud mode. In local mode, your commands run in containers on your local machine. In cloud mode, Conducto secures resources for your pipeline and the computation happens in the cloud.
Local mode is pretty powerful on its own, but cloud mode lets you scale your pipeline beyond what local resources would allow. It also lets you ensure that resources are always available to run pipelines when they're needed.
Local mode is free, as is limited cloud mode. To use the fully scalable cloud mode you first need to enable billing for your Conducto org.
Exec node runs a command, it creates a container for that command based on a Docker image.
The container's contents may change based on your command, but it starts with an initial disk image that remains fixed. You can control which software is available to the container by making modifications to the image before the container gets created. If you make changes, you'll want to be aware of whether you're changing something in the image, or something in the container, because it will affect how much reprocessing is needed before you can see your changes in action.
Certain parts of a node's filesystem allow you to share data.
/conducto/data/pipeline is shared by every node in a given pipeline.
You can write a value here in one node and expect it to be available in a later node.
/conducto/data/user is shared by every node of every pipeline that your user creates.
This is useful if you have multiple pipelines that need access to the same data.
Environment variables are a good place for secrets and other context-sensitive parameters. The Secrets section of your Conducto profile is for things like this. Conducto will provide them as variables at node runtime.
You can also specify environment variables explicitly in your pipeline definitions by using the
env node parameter.
Exec nodes run a shell command in a container.
Conducto reuses these containers for two reasons.
For speed, saving on startup time, or for state, to allow multiple nodes to build up the container's state incrementally.
Conducto makes it easy to reproduce errors and debug them in place.
If you arrange your pipeline well, the Conducto web app will quickly draw your attention to problem areas. But it can only zoom in so closely--at some point you're going to need tools that are specifically suited for that node. In the web app, click the debug button to copy a command for your terminal.
Pasting it into your terminal will give you an interactive shell session in a container that's just about to run the suspicious command, so you can poke around, attach a debugger, or work whatever magic you need. When you're ready to catch that bug, you can run the same script that Conducto would run and expect the same behavior that you'd see in-pipeline.
If you're writing your pipeline definition in python, and you want your node to call a python function, there's a shortcut.
Rather than setting up your
Exec nodes to run shell commands that invoke
python, instead initialize the Exec node directly with the function that you want it to call.
That node will call your function, and you won't have to worry about a command line interface for it.
It's just a stylistic choice, but when it fits it can make your pipeline much more readable.
Pipelines carry sensitive data, and security is an important part of Conducto's design.
In order for you to control pipelines remotely, there needs to be an agent on the local machine that maintains a connection to conducto.com and listens for events. If you launch a pipeline with a local command, an agent will be started automatically, otherwise the Conducto web app will give you a command to start one.
Lastly, you should know that Conducto pipelines aren't security sandboxes. You should think twice before launching pipelines from untrusted sources--especially if you're running them in local mode.
That's it for Basics. If you have a complex task that needs to be done repeatedly, the articles above should give you what you need to build a pipeline for it.
- Still have questions?
- Want to show off a cool pipeline that you built?
- Have an idea for how Conducto could be even more useful?
Whatever it is, feel free to drop us a line. Otherwise, happy pipelining.