tsdat.pipeline.pipelines

Classes

IngestPipeline

Pipeline class designed to read in raw, unstandardized time series data and enhance

TransformationPipeline

Pipeline class designed to read in standardized time series data and enhance

class tsdat.pipeline.pipelines.IngestPipeline[source]

Bases: tsdat.pipeline.base.Pipeline

Pipeline class designed to read in raw, unstandardized time series data and enhance its quality and usability by converting it into a standard format, embedding metadata, applying quality checks and controls, generating reference plots, and saving the data in an accessible format so it can be used later in scientific analyses or in higher-level tsdat Pipelines.

Class Methods

hook_customize_dataset

Code hook to customize the retrieved dataset prior to qc being applied.

hook_finalize_dataset

Code hook to finalize the dataset after qc is applied but before it is saved.

hook_plot_dataset

Code hook to create plots for the data which runs after the dataset has been saved.

run

Runs the data pipeline on the provided inputs.

Method Descriptions

hook_customize_dataset(self, dataset: xarray.Dataset) xarray.Dataset[source]

Code hook to customize the retrieved dataset prior to qc being applied.

Parameters

dataset (xr.Dataset) – The output dataset structure returned by the retriever API.

Returns

xr.Dataset – The customized dataset.

hook_finalize_dataset(self, dataset: xarray.Dataset) xarray.Dataset[source]

Code hook to finalize the dataset after qc is applied but before it is saved.

Parameters

dataset (xr.Dataset) – The output dataset returned by the retriever API and modified by the hook_customize_dataset user code hook.

Returns

xr.Dataset – The finalized dataset, ready to be saved.

hook_plot_dataset(self, dataset: xarray.Dataset)[source]

Code hook to create plots for the data which runs after the dataset has been saved.

Parameters

dataset (xr.Dataset) – The dataset to plot.

run(self, inputs: List[str], **kwargs: Any) xarray.Dataset[source]

Runs the data pipeline on the provided inputs.

Parameters
  • inputs (List[str]) – A list of input keys that the pipeline’s Retriever class

  • pipeline. (can use to load data into the) –

Returns

xr.Dataset – The processed dataset.

class tsdat.pipeline.pipelines.TransformationPipeline[source]

Bases: IngestPipeline

Pipeline class designed to read in standardized time series data and enhance its quality and usability by combining multiple sources of data, using higher-level processing techniques, etc.

class Parameters[source]

Bases: pydantic.BaseModel

datastreams :List[str][source]

A list of datastreams that the pipeline should be configured to run for. Datastreams should include the location and data level information.

parameters :TransformationPipeline.Parameters[source]
retriever :tsdat.io.retrievers.StorageRetriever[source]

Class Methods

hook_customize_input_datasets

Code hook to customize any input datasets prior to datastreams being combined

run

Runs the data pipeline on the provided inputs.

Method Descriptions

hook_customize_input_datasets(self, input_datasets: Dict[str, xarray.Dataset], **kwargs: Any) Dict[str, xarray.Dataset][source]

Code hook to customize any input datasets prior to datastreams being combined and data converters being run.

Parameters

input_datasets (Dict[str, xr.Dataset]) – The dictionary of input key (str) to input dataset. Note that for transformation pipelines, input keys != input filename, rather each input key is a combination of the datastream and date range used to pull the input data from the storage retriever.

Returns

Dict[str, xr.Dataset] – The customized input datasets.

run(self, inputs: List[str], **kwargs: Any) xarray.Dataset[source]

Runs the data pipeline on the provided inputs.

Parameters

inputs (List[str]) – A 2-element list of start-date, end-date that the pipeline should process.

Returns

xr.Dataset – The processed dataset.