Data Science

Ingest, transform, and analyze data

Doing data science means creating pipelines that manipulate and analyze data. Write your Conducto data pipeline in Python. The pipeline specifies commands and when they should run. Subtasks that can run in parallel are naturally expressed, and will automatically run at scale according to the compute resources available. Conducto is agnostic to where or how you store your data – invoke any shell command that calls any Linux package, library, or tool to access and manipulate your data. Launch your pipeline into our free cloud or run for free on your local machine. Upgrade to paid cloud mode for more scale. View and interact with your pipeline in the browser.

Getting started with Data Science in Conducto requires that you install our python package.

  • Register.
  • Run pip install conducto.
  • Run conducto-profile add, then login with your username & password.
  • Save the below to
  • Run python --cloud --run to launch into our free cloud.
  • View first pipeline in the browser.

import conducto as co

# Build a Docker image using contents of a Git repo
IMG = co.Image("python:3.8-slim",
    copy_url="", copy_branch="main",
    reqs_packages=["cloc"], reqs_py=["pandas"]

def main() -> co.Parallel:
    with co.Parallel(image=IMG) as root:
        # Count lines of code in the remote Git repo.
        root["lines of code"] = co.Exec("cloc .")
        # Run a simple data analysis script located there.
        root["biggest US cities"] = co.Exec(
            "cd features/copy_url && python cities.csv"
    return root

if __name__ == "__main__":

Now that you have run your first pipeline, you are all set up.

Write your own pipeline

Example pipelines

  • Combine sample user data with transaction data to build a model that predicts customer churn. [Sandbox][GitHub]
  • Download US weather data then visualize it. [Sandbox][GitHub]

Learn more

Join us on Slack in the #data-science channel.

Chat with us for a live demo right now!
(If we're awake 😴)