Pipeline Code Hooks

Each pipeline base class provides certain methods which the developer can override if desired to customize pipeline functionality. In your template repository, your Pipeline class will come with all the hook methods stubbed out automatically (i.e., they will be included with an empty definition).

The following hook methods (which can be easily identified because they all start with the ‘hook_’ prefix) are provided in the pipeline template found in the pipelines/<ingest_name>/pipeline.py file. They are listed in the order that they are executed (see image in Configuring Tsdat).

hook_customize_dataset

Code hook to customize the retrieved dataset prior to qc being applied.

hook_finalize_dataset

Code hook to finalize the dataset after qc is applied but before it is saved.

hook_plot_dataset

Code hook to create plots for the data which runs after the dataset has been saved.

The plotting hook (hook_plot_dataset) is likely to be the most useful for users. This hook creates plots and saves them to the storage directory with the output dataset and is a good way to check the pipeline output. Below is shown a custom plotting example:

def hook_plot_dataset(self, dataset: xr.Dataset):
    # DEVELOPER: (Optional, recommended) Create plots.
    location = self.dataset_config.attrs.location_id
    datastream: str = self.dataset_config.attrs.datastream

    date, time = get_start_date_and_time_str(dataset)

    plt.style.use("default")  # clear any styles that were set before
    plt.style.use("shared/styling.mplstyle")

    with self.storage.uploadable_dir(datastream) as tmp_dir:

        fig, ax = plt.subplots()
        dataset["example_var"].plot(ax=ax, x="time")  # type: ignore
        fig.suptitle(f"Example Variable at {location} on {date} {time}")
        format_time_xticks(ax)
        plot_file = get_filename(dataset, title="example_plot", extension="png")
        fig.savefig(tmp_dir / plot_file)
        plt.close(fig)
class tsdat.pipeline.pipelines.IngestPipeline(*, parameters: Any = {}, settings: Any = None, triggers: List[Pattern] = [], retriever: Retriever, dataset: DatasetConfig, quality: QualityManagement, storage: Storage)[source]

Pipeline class designed to read in raw, unstandardized time series data and enhance its quality and usability by converting it into a standard format, embedding metadata, applying quality checks and controls, generating reference plots, and saving the data in an accessible format so it can be used later in scientific analyses or in higher-level tsdat Pipelines.

hook_customize_dataset(dataset: Dataset) Dataset[source]

Code hook to customize the retrieved dataset prior to qc being applied.

Parameters

dataset (xr.Dataset) – The output dataset structure returned by the retriever API.

Returns

xr.Dataset – The customized dataset.

hook_finalize_dataset(dataset: Dataset) Dataset[source]

Code hook to finalize the dataset after qc is applied but before it is saved.

Parameters

dataset (xr.Dataset) – The output dataset returned by the retriever API and modified by the hook_customize_dataset user code hook.

Returns

xr.Dataset – The finalized dataset, ready to be saved.

hook_plot_dataset(dataset: Dataset)[source]

Code hook to create plots for the data which runs after the dataset has been saved.

Parameters

dataset (xr.Dataset) – The dataset to plot.