tsdat.pipeline.pipelines
Classes
Pipeline class designed to read in raw, unstandardized time series data and enhance |
|
Pipeline class designed to read in standardized time series data and enhance |
- class tsdat.pipeline.pipelines.IngestPipeline[source]
Bases:
tsdat.pipeline.base.Pipeline
Pipeline class designed to read in raw, unstandardized time series data and enhance its quality and usability by converting it into a standard format, embedding metadata, applying quality checks and controls, generating reference plots, and saving the data in an accessible format so it can be used later in scientific analyses or in higher-level tsdat Pipelines.
Class Methods
Returns the path to where an ancillary file should be saved so that it can be
Code hook to customize the retrieved dataset prior to qc being applied.
Code hook to finalize the dataset after qc is applied but before it is saved.
Code hook to create plots for the data which runs after the dataset has been saved.
Runs the data pipeline on the provided inputs.
Method Descriptions
- get_ancillary_filepath(title: str, extension: str = 'png', **kwargs: Any) pathlib.Path [source]
Returns the path to where an ancillary file should be saved so that it can be synced to the storage area automatically.
- Parameters:
title (str) – The title to use for the plot filename. Should only contain alphanumeric and ‘_’ characters.
extension (str, optional) – The file extension. Defaults to “png”.
- Returns:
Path – The ancillary filepath.
- hook_customize_dataset(dataset: xarray.Dataset) xarray.Dataset [source]
Code hook to customize the retrieved dataset prior to qc being applied.
- Parameters:
dataset (xr.Dataset) – The output dataset structure returned by the retriever API.
- Returns:
xr.Dataset – The customized dataset.
- hook_finalize_dataset(dataset: xarray.Dataset) xarray.Dataset [source]
Code hook to finalize the dataset after qc is applied but before it is saved.
- Parameters:
dataset (xr.Dataset) – The output dataset returned by the retriever API and modified by the hook_customize_dataset user code hook.
- Returns:
xr.Dataset – The finalized dataset, ready to be saved.
- class tsdat.pipeline.pipelines.TransformationPipeline[source]
Bases:
IngestPipeline
Pipeline class designed to read in standardized time series data and enhance its quality and usability by combining multiple sources of data, using higher-level processing techniques, etc.
- class Parameters[source]
Bases:
pydantic.BaseModel
- datastreams: List[str][source]
A list of datastreams that the pipeline should be configured to run for. Datastreams should include the location and data level information.
- parameters: TransformationPipeline.Parameters[source]
Class Methods
Code hook to customize any input datasets prior to datastreams being combined
Runs the data pipeline on the provided inputs.
Method Descriptions
- hook_customize_input_datasets(input_datasets: Dict[str, xarray.Dataset], **kwargs: Any) Dict[str, xarray.Dataset] [source]
Code hook to customize any input datasets prior to datastreams being combined and data converters being run.
- Parameters:
input_datasets (Dict[str, xr.Dataset]) – The dictionary of input key (str) to input dataset. Note that for transformation pipelines, input keys != input filename, rather each input key is a combination of the datastream and date range used to pull the input data from the storage retriever.
- Returns:
Dict[str, xr.Dataset] – The customized input datasets.