`tsdat.pipeline.pipelines`¶

Classes¶

`IngestPipeline`	Pipeline class designed to read in raw, unstandardized time series data and enhance
`TransformationPipeline`	Pipeline class designed to read in standardized time series data and enhance

class tsdat.pipeline.pipelines.IngestPipeline[source]¶

Bases: tsdat.pipeline.base.Pipeline

Pipeline class designed to read in raw, unstandardized time series data and enhance its quality and usability by converting it into a standard format, embedding metadata, applying quality checks and controls, generating reference plots, and saving the data in an accessible format so it can be used later in scientific analyses or in higher-level tsdat Pipelines.

Class Methods

`hook_customize_dataset`	Code hook to customize the retrieved dataset prior to qc being applied.
`hook_finalize_dataset`	Code hook to finalize the dataset after qc is applied but before it is saved.
`hook_plot_dataset`	Code hook to create plots for the data which runs after the dataset has been saved.
`run`	Runs the data pipeline on the provided inputs.

Method Descriptions

hook_customize_dataset(self, dataset: xarray.Dataset) → xarray.Dataset[source]¶

Code hook to customize the retrieved dataset prior to qc being applied.

Parameters: dataset (xr.Dataset) – The output dataset structure returned by the retriever API.
Returns: xr.Dataset – The customized dataset.

hook_finalize_dataset(self, dataset: xarray.Dataset) → xarray.Dataset[source]¶

Code hook to finalize the dataset after qc is applied but before it is saved.

Parameters: dataset (xr.Dataset) – The output dataset returned by the retriever API and modified by the hook_customize_dataset user code hook.
Returns: xr.Dataset – The finalized dataset, ready to be saved.

hook_plot_dataset(self, dataset: xarray.Dataset)[source]¶

Code hook to create plots for the data which runs after the dataset has been saved.

Parameters: dataset (xr.Dataset) – The dataset to plot.

run(self, inputs: List[str], **kwargs: Any) → xarray.Dataset[source]¶

Runs the data pipeline on the provided inputs.

Parameters

inputs (List[str]) – A list of input keys that the pipeline’s Retriever class
pipeline. (can use to load data into the) –

Returns

xr.Dataset – The processed dataset.

class tsdat.pipeline.pipelines.TransformationPipeline[source]¶

Bases: IngestPipeline

Pipeline class designed to read in standardized time series data and enhance its quality and usability by combining multiple sources of data, using higher-level processing techniques, etc.

retriever :tsdat.io.retrievers.StorageRetriever[source]¶

Class Methods

run

Runs the data pipeline on the provided inputs.

Method Descriptions

run(self, inputs: List[str], **kwargs: Any) → xarray.Dataset[source]¶

Runs the data pipeline on the provided inputs.

Note that input keys to TransformationPipelines are different than inputs to IngestPipelines. Here each input key is expected to follow a standard format:

“datastream::start-date::end-date”,

e.g., “sgp.myingest.b1::20220913.000000::20220914.000000”

This format allows the retriever to pull datastream data from the Storage API for the desired dates for each desired input source.

Parameters

inputs (List[str]) – A list of input keys that the pipeline’s Retriever class
pipeline. (can use to load data into the) –

Returns

xr.Dataset – The processed dataset.

tsdat.pipeline.pipelines¶

Classes¶

`tsdat.pipeline.pipelines`¶