Pipeline Code Hooks¶
Each pipeline base class provides certain methods which the developer can override if desired to customize pipeline functionality. In your template repository, your Pipeline class will come with all the hook methods stubbed out automatically (i.e., they will be included with an empty definition).
The following hook methods (which can be easily identified because they all start with the
‘hook_’ prefix) are provided in the pipeline template found in the pipelines/<ingest_name>/pipeline.py
file.
They are listed in the order that they are executed (see image in Configuring Tsdat).
User-overrideable code hook that runs after the retriever has retrieved the dataset from the specified input keys, but before the pipeline has applied any quality checks or corrections to the dataset. |
|
User-overrideable code hook that runs after the dataset quality has been managed but before the dataset has been sent to the storage API to be saved. |
|
User-overrideable code hook that runs after the dataset has been saved by the storage API. |
The plotting hook (hook_plot_dataset
) is likely to be the
most useful for users. This hook creates plots and saves them to the storage
directory with the output dataset and is a good way to check the pipeline
output. Below is shown a custom plotting example:
def hook_plot_dataset(self, dataset: xr.Dataset):
# DEVELOPER: (Optional, recommended) Create plots.
location = self.dataset_config.attrs.location_id
datastream: str = self.dataset_config.attrs.datastream
date, time = get_start_date_and_time_str(dataset)
plt.style.use("default") # clear any styles that were set before
plt.style.use("shared/styling.mplstyle")
with self.storage.uploadable_dir(datastream) as tmp_dir:
fig, ax = plt.subplots()
dataset["example_var"].plot(ax=ax, x="time") # type: ignore
fig.suptitle(f"Example Variable at {location} on {date} {time}")
format_time_xticks(ax)
plot_file = get_filename(dataset, title="example_plot", extension="png")
fig.savefig(tmp_dir / plot_file)
plt.close(fig)
-
class
tsdat.pipeline.pipelines.
IngestPipeline
(*, parameters: Any = {}, settings: Any = None, triggers: List[Pattern] = [], retriever: tsdat.io.base.Retriever, dataset: tsdat.config.dataset.DatasetConfig, quality: tsdat.qc.base.QualityManagement, storage: tsdat.io.base.Storage)[source]¶ Pipeline class designed to read in raw, unstandardized time series data and enhance its quality and usability by converting it into a standard format, embedding metadata, applying quality checks and controls, generating reference plots, and saving the data in an accessible format so it can be used later in scientific analyses or in higher-level tsdat Pipelines.
-
hook_customize_dataset
(dataset: xarray.core.dataset.Dataset) → xarray.core.dataset.Dataset[source]¶ User-overrideable code hook that runs after the retriever has retrieved the dataset from the specified input keys, but before the pipeline has applied any quality checks or corrections to the dataset.
- Parameters
dataset (xr.Dataset) – The output dataset structure returned by the retriever
API. –
- Returns
The customized dataset.
- Return type
xr.Dataset
-
hook_finalize_dataset
(dataset: xarray.core.dataset.Dataset) → xarray.core.dataset.Dataset[source]¶ User-overrideable code hook that runs after the dataset quality has been managed but before the dataset has been sent to the storage API to be saved.
- Parameters
dataset (xr.Dataset) – The output dataset returned by the retriever API and
by the hook_customize_retrieved_dataset user code hook. (modified) –
- Returns
The finalized dataset, ready to be saved.
- Return type
xr.Dataset
-