Getting Started

To get started developing a tsdat pipeline, we suggest following the following steps, which are explained in more detail in the linked sections:

  1. Install tsdat

  2. Get a template

  3. Configure template

  4. Run pipeline

Prerequisites

Tsdat requires Python 3.8+

Installation

You can install tsdat simply by running pip install tsdat in a console window.

Getting a Tsdat Pipeline Template

The quickest way to set up a Tsdat pipeline is to use a GitHub repository template. You can find a list of template repositories for tsdat at https://github.com/tsdat/template-repositories.

Note

Currently, there are only two ingest templates available, but more will be added over time, including support for VAPs, multi-pipeline templates, and specific data models.

  1. Local Ingest Template

    Use this template to run ingest pipelines on your local computer.

  2. AWS Ingest Template

    Use this template to run ingest pipelines on AWS. (It requires an AWS account.)

Once you have selected the template to use, select the “Use this template” button to create a new repository at your specified location with the template contents.

Use a GitHub pipeline repository template to jumpstart tsdat development.

Once you have created a new repository from the template, you can clone your repository to your local desktop and start developing. By default, the repository template will come pre-configured to run out-of-the-box on an example dataset.

See configuring your pipeline for more information on tsdat-specific configuration file and code customizations. In addtion, make sure to read the README.md file associated with your template for any template-specific instructions.

Running Your tsdat Pipeline

Once tsdat is installed and your pipeline template is configured, you can run it on your input data using the following code from a terminal window at the top level of your repository:

python3 run_pipeline.py

By default this will run the pipeline on all files in the data/inputs folder and it will run in ‘dev’ mode, with all outputs going to the storage/root folder. To run the pipeline in production mode on a specific file, use the following syntax:

python3 run_pipeline.py $PATH_TO_YOUR_FILE --mode prod

For command-line help:

python3 run_pipeline.py -h

For detailed examples of how to set up and use tsdat, consult the Examples and Tutorials section.