Generally speaking, a Conducto node's file tree gets cleaned up when the node completes. These paths are an exception:

  • /conducto/data/pipeline
  • /conducto/data/user

They can be used to share data between nodes or between pipelines.

Scope

Pipeline Data

New pipelines start with no data in /conducto/data/pipeline. If a node writes a file there, then every node in that pipeline will be able to read that file.

This is a great place put data that you want to communicate between nodes.

User Data

All pipelines created by a user will share access to the same instance of /conducto/data/user, and this data will be kept separate from other users' pipelines.

This lets you write pipelines that analyze data generated by other pipelines. For instance, you could have a daily pipeline that writes to /conducto/data/user/{today}/data and a monthly summary that reads data for each day that month.

Persistance

Cloud pipelines use s3fs-fuse to synchronize these paths with Amazon S3. This comes with some limitations, but you don't have to worry about them as long as you stick to writing a file and then reading it later.

In local mode Conducto mounts parts of your local filesystem into the container. You can often get away with just using these paths inside a node, but in case you need to access them hostside:

  • pipeline data goes in .conducto/{profile-id}/pipelines/{pipeline-id}/data
  • user data goes in .conducto/{profile-id}/data/.

Conducto creates .conducto in your home folder when you launch your first pipeline, after which you can determine your profile-id with this command:

$ conducto-profile list
Profile 87a5a060
	URL:  https://www.conducto.com
	Organization:  Justice League
	e-mail:  jstewart@ferrisaircraft.com

Summary

Conducto uses shared filesystems to pass data between nodes. Pipeline data is scoped to the pipeline that created it, and user data is scoped to the user that created it.

Maybe your data is external to Conducto, or filesystems aren't right for the job. If so, remember that Conducto nodes are just containers around your code--code that can use whatever data access tools you like (e.g. you might use a Python API to work with an external database).