tsdat.pipeline.pipeline

Module Contents

Classes

Pipeline

This class serves as the base class for all tsdat data pipelines.

class tsdat.pipeline.pipeline.Pipeline(pipeline_config: Union[str, tsdat.config.Config], storage_config: Union[str, tsdat.io.DatastreamStorage])[source]

Bases: abc.ABC

This class serves as the base class for all tsdat data pipelines.

Parameters
  • pipeline_config (Union[str, Config]) – The pipeline config file. Can be either a config object, or the path to the pipeline config file that should be used with this pipeline.

  • storage_config (Union[str, DatastreamStorage]) – The storage config file. Can be either a config object, or the path to the storage config file that should be used with this pipeline.

abstract run(self, filepath: Union[str, List[str]])[source]

This method is the entry point for the pipeline. It will take one or more file paths and process them from start to finish. All classes extending the Pipeline class must implement this method.

Parameters

filepath (Union[str, List[str]]) – The path or list of paths to the file(s) to run the pipeline on.

standardize_dataset(self, raw_mapping: Dict[str, xarray.Dataset])xarray.Dataset[source]

Standardizes the dataset by applying variable name and units conversions as defined by the pipeline config file. This method returns the standardized dataset.

Parameters

raw_mapping (Dict[str, xr.Dataset]) – The raw dataset mapping.

Returns

The standardized dataset.

Return type

xr.Dataset

check_required_variables(self, dataset: xarray.Dataset, dod: tsdat.config.DatasetDefinition)[source]

Function to throw an error if a required variable could not be retrieved.

Parameters
  • dataset (xr.Dataset) – The dataset to check.

  • dod (DatasetDefinition) – The DatasetDefinition used to specify required variables.

Raises

Exception – Raises an exception to indicate the variable could not be retrieved.

add_static_variables(self, dataset: xarray.Dataset, dod: tsdat.config.DatasetDefinition)xarray.Dataset[source]

Uses the DatasetDefinition to add static variables (variables whose data are defined in the pipeline config file) to the output dataset.

Parameters
  • dataset (xr.Dataset) – The dataset to add static variables to.

  • dod (DatasetDefinition) – The DatasetDefinition to pull data from.

Returns

The original dataset with added variables from the config

Return type

xr.Dataset

add_missing_variables(self, dataset: xarray.Dataset, dod: tsdat.config.DatasetDefinition)xarray.Dataset[source]

Uses the dataset definition to initialize variables that are defined in the dataset definiton but did not have input. Uses the appropriate shape and _FillValue to initialize each variable.

Parameters
  • dataset (xr.Dataset) – The dataset to add the variables to.

  • dod (DatasetDefinition) – The DatasetDefinition to use.

Returns

The original dataset with variables that still need to be initialized, initialized.

Return type

xr.Dataset

add_attrs(self, dataset: xarray.Dataset, raw_mapping: Dict[str, xarray.Dataset], dod: tsdat.config.DatasetDefinition)xarray.Dataset[source]

Adds global and variable-level attributes to the dataset from the DatasetDefinition object.

Parameters
  • dataset (xr.Dataset) – The dataset to add attributes to.

  • raw_mapping (Dict[str, xr.Dataset]) – The raw dataset mapping. Used to set the input_files global attribute.

  • dod (DatasetDefinition) – The DatasetDefinition containing the attributes to add.

Returns

The original dataset with the attributes added.

Return type

xr.Dataset

get_previous_dataset(self, dataset: xarray.Dataset)xarray.Dataset[source]

Utility method to retrieve the previous set of data for hte same datastream as the provided dataset from the DatastreamStorage.

Parameters

dataset (xr.Dataset) – The reference dataset that will be used to search the DatastreamStore for prior data.

Returns

The previous dataset from the DatastreamStorage if it exists, otherwise None.

Return type

xr.Dataset

reduce_raw_datasets(self, raw_mapping: Dict[str, xarray.Dataset], definition: tsdat.config.DatasetDefinition)List[xarray.Dataset][source]

Removes unused variables from each raw dataset in the raw mapping and performs input to output naming and unit conversions as defined in the dataset definition.

Parameters
  • raw_mapping (Dict[str, xr.Dataset]) – The raw xarray dataset mapping.

  • definition (DatasetDefinition) – The DatasetDefinition used to select the variables to keep.

Returns

A list of reduced datasets.

Return type

List[xr.Dataset]

reduce_raw_dataset(self, raw_dataset: xarray.Dataset, variable_definitions: List[tsdat.config.VariableDefinition], definition: tsdat.config.DatasetDefinition)xarray.Dataset[source]

Removes unused variables from the raw dataset provided and keeps only the variables and coordinates pertaining to the provdided variable definitions. Also performs input to output naming and unit conversions as defined in the DatasetDefinition.

Parameters
  • raw_dataset (xr.Dataset) – The raw dataset mapping.

  • variable_definitions (List[VariableDefinition]) – List of variables to keep.

  • definition (DatasetDefinition) – The DatasetDefinition used to select the variables to keep.

Returns

The reduced dataset.

Return type

xr.Dataset

decode_cf(self, dataset: xarray.Dataset)xarray.Dataset[source]

Decodes the dataset according to CF conventions. This helps ensure that the dataset is formatted correctly after it has been constructed from unstandardized sources or heavily modified. :param dataset: The dataset to decode. :type dataset: xr.Dataset

Returns

The decoded dataset.

Return type

xr.Dataset