tsdat.pipeline.base

Classes

Pipeline

Base class for tsdat data pipelines.

class tsdat.pipeline.base.Pipeline[source]

Bases: tsdat.utils.ParameterizedClass, abc.ABC

Base class for tsdat data pipelines.

dataset_config :tsdat.config.dataset.DatasetConfig[source]

Describes the structure and metadata of the output dataset.

quality :tsdat.qc.base.QualityManagement[source]

Manages the dataset quality through checks and corrections.

retriever :tsdat.io.base.Retriever[source]

Retrieves data from input keys.

settings :Any[source]
storage :tsdat.io.base.Storage[source]

Stores the dataset so it can be retrieved later.

triggers :List[Pattern] = [][source]

Regex patterns matching input keys to determine when the pipeline should run.

Class Methods

prepare_retrieved_dataset

Modifies the retrieved dataset by dropping variables not declared in the

run

Runs the data pipeline on the provided inputs.

Method Descriptions

prepare_retrieved_dataset(self, dataset: xarray.Dataset) xarray.Dataset[source]

Modifies the retrieved dataset by dropping variables not declared in the DatasetConfig, adding static variables, initializing non-retrieved variables, and importing global and variable-level attributes from the DatasetConfig.

Parameters

dataset (xr.Dataset) – The retrieved dataset.

Returns

xr.Dataset – The dataset with structure and metadata matching the DatasetConfig.

abstract run(self, inputs: List[str], **kwargs: Any) Any[source]

Runs the data pipeline on the provided inputs.

Parameters
  • inputs (List[str]) – A list of input keys that the pipeline’s Retriever class

  • pipeline. (can use to load data into the) –

Returns

xr.Dataset – The processed dataset.