Accessing and saving data is fundamental to many categories of pipelines, including data science, ETL, analytics, and more. The Conducto philosophy of data is to let you call your own functions to access your own data. Builtin methods are not needed for connecting to different data sources, because you can generate your pipeline using the full power of Python. Lazy Pipeline Creation helps you iterate over your data dynamically, and there are a few additional features to help with data access.
Accessing your own data sources often requires using secret keys and tokens that you don’t save anywhere in plaintext. Conducto will save these for you encrypted in the AWS Parameter Store, only accessible using your own login credentials.
In your Profile in the app, you can set your own user-level secrets, visible to only you. If you are an Admin of your Org, you may also create org-level secrets that are visible to anyone in your org.
Set key/value pairs that are passed into the environment of each Exec node. They are visible to your commands but are not displayed to the user in the app.
Conducto provides a simple object-store for your data. In local mode it is backed by local disk, and in cloud mode it is backed by S3. Cloud data, like cloud logs, are permissioned to only be accessible by your user.
conducto.data.pipeline gives you a way to store data that is scoped
to the current pipeline. When your pipeline is archived and the logs cleaned up,
your data will be deleted as well. ETL and data science pipelines will commonly
need a temporary but high performance location to stage data, and
co.data.pipeline is one simple option.
conducto.data.user provides a similar interface for data that should
persist past the life of the pipeline. It is scoped by user and is only visible
to the user who created it.
Delete object at name.
get(name, file, local=None)¶
Get object at name, store it to file.
gets(name, *, byte_range: Optional[List[int]] = None) → bytes¶
Return object at name. Optionally restrict to the given byte_range. Byte range is on the half open interval [begin, end)
Return names of objects that start with prefix.
put(name, file, skip_cleanup=False)¶
Store object in file to name.
Return the size of the object at name, in bytes.