Compare Conducto to Airflow

By: Jonathan Marcus

Conducto vs Airflow

Debugging

Conducto makes it easy to find errors, reproduce them in a click, debug with local code, and update the pipeline to continue without starting over.

  • Find errors: The UI shows the pipeline and logs side-by-side, so you can quickly see errors in context.
  • Reproduce in a click: Conducto tasks run in Docker containers. With one click Conducto will start your task in an interactive container that reproduces the exact environment of the error.
  • Debug with local code: Conducto mounts your local code into the container. Use your native editor to add debug code and test your uncommitted changes.
  • Update pipeline: Once you've fixed the bug, you can rebuild the Docker image. Then retry the errored node and let your pipeline run to completion with your new code.

Fix specification bugs by modifying commands, environments, or resources on the fly. Some bugs don't need to be fixed and can simply be skipped.

All of these features together make it very easy to debug errors in Conducto, even complex errors in long-running pipelines.

Debugging and testing Airflow is hard. Reproducing the execution environment is a big challenge, and "running an Airflow DAG on your local machine is often not possible due to dependencies on external systems." There's no way to skip tasks ad hoc in a running pipeline, and any manual intervention requires changing the code and starting from the beginning. Users have gotten very creative, but debugging is not a strong suit of Airflow.

Infrastructure

To run Conducto locally just pip install conducto and install Docker. To get more scale, Conducto Cloud manages the infrastructure to enable you to run hundreds of tasks simultaneously. There's a limited free cloud mode, and beyond that is pay-per-use. You can scale from zero and only pay for what you need. The local-to-cloud portability is possible because all tasks run in containers.

Running Airflow requires you to set up your own cluster. Scaling to fit your workload is a challenge, so you often end up with expensive overprovisioning. Local running is not an option.

Also, Conducto pipelines all run in containers, so pipelines are exactly reproducible on your teammates' machines. Airflow is difficult to reproduce exactly.

Dynamic pipelines

In a simple Extract->Transform->Load pipeline, you may want to Transform your data in parallel, but your choice of how to parallelize can depend on the output of the Extract step.

Airflow pipelines are defined in Python but are static and cannot be modified at all between runs.

Conducto pipelines are defined dynamically, and can even be extended during the run. The Python API lets you write your pipeline with loops, data sources, and business logic. You can easily branch or scale as needed.

Example pipelines

  • Combine sample user data with transaction data to build a model that predicts customer churn. [Sandbox] [GitHub]

  • Download genomic data and analyze it in a notebook-style presentation. [Sandbox] [GitHub]