tsdat 

Writes the dataset to the provided filepath.

Method Descriptions

write(self, dataset: xarray.Dataset, filepath: Optional[pathlib.Path] = None) → None

Writes the dataset to the provided filepath.

This method is typically called by the tsdat storage API, which will be responsible for providing the filepath, including the file extension.

Parameters

dataset (xr.Dataset) – The dataset to save.
filepath (Optional[Path]) – The path to the file to save.

class tsdat.CheckFailDelta[source]

Bases: _CheckDelta

Checks for deltas between consecutive values larger than ‘fail_delta’.

attribute_name :str = fail_delta

class tsdat.CheckFailMax[source]

Bases: _CheckMax

Checks for values greater than ‘fail_max’.

attribute_name :str = fail_max

class tsdat.CheckFailMin[source]

Bases: _CheckMin

Checks for values less than ‘fail_min’.

attribute_name :str = fail_min

class tsdat.CheckFailRangeMax[source]

Bases: _CheckMax

Checks for values greater than ‘fail_range’.

attribute_name :str = fail_range

class tsdat.CheckFailRangeMin[source]

Bases: _CheckMin

Checks for values less than ‘fail_range’.

attribute_name :str = fail_range

class tsdat.CheckMissing[source]

Bases: tsdat.qc.base.QualityChecker

Checks if any data are missing. A variable’s data are considered missing if they are set to the variable’s _FillValue (if it has a _FillValue) or NaN (NaT for datetime- like variables).

Class Methods

check_monotonic_not_increasing_and_decreasing

Identifies and flags quality problems with the data.

Method Descriptions

run(self, dataset: xarray.Dataset, variable_name: str) → numpy.typing.NDArray[numpy.bool8]

Identifies and flags quality problems with the data.

Checks the quality of a specific variable in the dataset and returns the results of the check as a boolean array where True values represent quality problems and False values represent data that passes the quality check.

QualityCheckers should not modify dataset variables; changes to the dataset should be made by QualityHandler(s), which receive the results of a QualityChecker as input.

Parameters

dataset (xr.Dataset) – The dataset containing the variable to check.
variable_name (str) – The name of the variable to check.

Returns

NDArray[np.bool8] – The results of the quality check, where True values indicate a quality problem.

class tsdat.CheckMonotonic[source]

Bases: tsdat.qc.base.QualityChecker

Checks if any values are not ordered strictly monotonically (i.e. values must all be increasing or all decreasing). The check marks all values as failed if any data values are not ordered monotonically.

class Parameters

Bases: pydantic.BaseModel

dim :Optional[str]

require_decreasing :bool = False

require_increasing :bool = False

Class Methods

Method Descriptions

classmethod check_monotonic_not_increasing_and_decreasing(cls, inc: bool, values: Dict[str, Any]) → bool

parameters :CheckMonotonic.Parameters

Class Methods

`get_axis`
`run`	Identifies and flags quality problems with the data.

Method Descriptions

get_axis(self, variable: xarray.DataArray) → int

run(self, dataset: xarray.Dataset, variable_name: str) → numpy.typing.NDArray[numpy.bool8]

Identifies and flags quality problems with the data.

Checks the quality of a specific variable in the dataset and returns the results of the check as a boolean array where True values represent quality problems and False values represent data that passes the quality check.

QualityCheckers should not modify dataset variables; changes to the dataset should be made by QualityHandler(s), which receive the results of a QualityChecker as input.

Parameters

dataset (xr.Dataset) – The dataset containing the variable to check.
variable_name (str) – The name of the variable to check.

Returns

NDArray[np.bool8] – The results of the quality check, where True values indicate a quality problem.

class tsdat.CheckValidDelta[source]

Bases: _CheckDelta

Checks for deltas between consecutive values larger than ‘valid_delta’.

attribute_name :str = valid_delta

class tsdat.CheckValidMax[source]

Bases: _CheckMax

Checks for values greater than ‘valid_max’.

attribute_name :str = valid_max

class tsdat.CheckValidMin[source]

Bases: _CheckMin

Checks for values less than ‘valid_min’.

attribute_name :str = valid_min

class tsdat.CheckValidRangeMax[source]

Bases: _CheckMax

Checks for values greater than ‘valid_range’.

attribute_name :str = valid_range

class tsdat.CheckValidRangeMin[source]

Bases: _CheckMin

Checks for values less than ‘valid_range’.

attribute_name :str = valid_range

class tsdat.CheckWarnDelta[source]

Bases: _CheckDelta

Checks for deltas between consecutive values larger than ‘warn_delta’.

attribute_name :str = warn_delta

class tsdat.CheckWarnMax[source]

Bases: _CheckMax

Checks for values greater than ‘warn_max’.

attribute_name :str = warn_max

class tsdat.CheckWarnMin[source]

Bases: _CheckMin

Checks for values less than ‘warn_min’.

attribute_name :str = warn_min

class tsdat.CheckWarnRangeMax[source]

Bases: _CheckMax

Checks for values greater than ‘warn_range’.

attribute_name :str = warn_range

class tsdat.CheckWarnRangeMin[source]

Bases: _CheckMin

Checks for values less than ‘warn_range’.

attribute_name :str = warn_range

class tsdat.DataConverter[source]

Bases: tsdat.utils.ParameterizedClass, abc.ABC

Base class for running data conversions on retrieved raw dataset.

Class Methods

convert

Runs the data converter on the provided (retrieved) dataset.

Method Descriptions

abstract convert(self, dataset: xarray.Dataset, dataset_config: tsdat.config.dataset.DatasetConfig, variable_name: str, **kwargs: Any) → xarray.Dataset

Runs the data converter on the provided (retrieved) dataset.

Parameters

dataset (xr.Dataset) – The dataset to convert.
dataset_config (DatasetConfig) – The dataset configuration.
variable_name (str) – The name of the variable to convert.

Returns

xr.Dataset – The converted dataset.

class tsdat.DataHandler[source]

Bases: tsdat.utils.ParameterizedClass

Groups a DataReader subclass and a DataWriter subclass together.

This provides a unified approach to data I/O. DataHandlers are typically expected to be able to round-trip the data, i.e. the following psuedocode is generally true:

handler.read(handler.write(dataset))) == dataset

Parameters

reader (DataReader) – The DataReader subclass responsible for reading input data.
writer (FileWriter) – The FileWriter subclass responsible for writing output
data. –

parameters :Any

reader :DataReader

writer :DataWriter

class tsdat.DataReader[source]

Bases: tsdat.utils.ParameterizedClass, abc.ABC

Base class for reading data from an input source.

Parameters

regex (Pattern[str]) – The regex pattern associated with the DataReader. If
pipeline (calling the DataReader from a tsdat) –
checked (this pattern will be) –
called. (against each possible input key before the read() method is) –

Class Methods

read

Reads data given an input key.

Method Descriptions

abstract read(self, input_key: str) → Union[xarray.Dataset, Dict[str, xarray.Dataset]]

Reads data given an input key.

Uses the input key to open a resource and load data as a xr.Dataset object or as a mapping of strings to xr.Dataset objects.

In most cases DataReaders will only need to return a single xr.Dataset, but occasionally some types of inputs necessitate that the data loaded from the input_key be returned as a mapping. For example, if the input_key is a path to a zip file containing multiple disparate datasets, then returning a mapping is appropriate.

Parameters

input_key (str) – An input key matching the DataReader’s regex pattern that should be used to load data.

Returns

Union[xr.Dataset, Dict[str, xr.Dataset]] –

The raw data extracted from the: provided input key.

class tsdat.DataWriter[source]

Bases: tsdat.utils.ParameterizedClass, abc.ABC

Base class for writing data to storage area(s).

Class Methods

Writes the dataset to the storage area.

Method Descriptions

abstract write(self, dataset: xarray.Dataset, **kwargs: Any) → None

Writes the dataset to the storage area.

This method is typically called by the tsdat storage API, which will be responsible for providing any additional parameters required by subclasses of the tsdat.io.base.DataWriter class.

Parameters: dataset (xr.Dataset) – The dataset to save.

class tsdat.DatasetConfig[source]

Bases: tsdat.config.utils.YamlModel

Defines the structure and metadata of the dataset produced by a tsdat pipeline.

Also provides methods to support yaml parsing and validation, including generation of json schema.

Parameters

attrs (GlobalAttributes) – Attributes that pertain to the dataset as a whole.
coords (Dict[str, Coordinate]) – The dataset’s coordinate variables.
data_vars (Dict[str, Variable]) – The dataset’s data variables.

attrs :tsdat.config.attributes.GlobalAttributes

coords :Dict[VarName, tsdat.config.variables.Coordinate]

data_vars :Dict[VarName, tsdat.config.variables.Variable]

Class Methods

`__contains__`
`__getitem__`
`set_variable_name_property`
`time_in_coords`
`validate_variable_name_uniqueness`
`variable_names_are_legal`

Method Descriptions

__contains__(self, __o: object) → bool

__getitem__(self, name: str) → Union[tsdat.config.variables.Variable, tsdat.config.variables.Coordinate]

classmethod set_variable_name_property(cls, vars: Dict[str, Dict[str, Any]]) → Dict[str, Dict[str, Any]]

classmethod time_in_coords(cls, coords: Dict[VarName, tsdat.config.variables.Coordinate]) → Dict[VarName, tsdat.config.variables.Coordinate]

classmethod validate_variable_name_uniqueness(cls, values: Any) → Any

variable_names_are_legal(cls, vars: Dict[str, tsdat.config.variables.Variable], field: pydantic.fields.ModelField) → Dict[str, tsdat.config.variables.Variable]

class tsdat.DefaultRetriever[source]

Bases: tsdat.io.base.Retriever

Default API for retrieving data from one or more input sources.

Reads data from one or more inputs, renames coordinates and data variables according to retrieval and dataset configurations, and applies registered DataConverters to retrieved data.

Parameters

readers (Dict[Pattern[str], DataReader]) – A mapping of patterns to DataReaders that the retriever uses to determine which DataReader to use for reading any given input key.
coords (Dict[str, Dict[Pattern[str], VariableRetriever]]) – A dictionary mapping output coordinate variable names to rules for how they should be retrieved.
data_vars (Dict[str, Dict[Pattern[str], VariableRetriever]]) – A dictionary mapping output data variable names to rules for how they should be retrieved.

class Parameters

Bases: pydantic.BaseModel

merge_kwargs :Dict[str, Any]: Keyword arguments passed to xr.merge(). This is only relevant if multiple input keys are provided simultaneously, or if any registered DataReader objects could return a dataset mapping instead of a single dataset.

coords :Dict[str, Dict[Pattern, RetrievedVariable]]: A dictionary mapping output coordinate names to the retrieval rules and preprocessing actions (e.g., DataConverters) that should be applied to each retrieved coordinate variable.

data_vars :Dict[str, Dict[Pattern, RetrievedVariable]]: A dictionary mapping output data variable names to the retrieval rules and preprocessing actions (e.g., DataConverters) that should be applied to each retrieved data variable.

parameters :DefaultRetriever.Parameters

readers :Dict[Pattern, tsdat.io.base.DataReader]: A dictionary of DataReaders that should be used to read data provided an input key.

Class Methods

retrieve

Prepares the raw dataset mapping for use in downstream pipeline processes.

Method Descriptions

retrieve(self, input_keys: List[str], dataset_config: tsdat.config.dataset.DatasetConfig, **kwargs: Any) → xarray.Dataset

Prepares the raw dataset mapping for use in downstream pipeline processes.

This is done by consolidating the data into a single xr.Dataset object. The retrieved dataset may contain additional coords and data_vars that are not defined in the output dataset. Input data converters are applied as part of the preparation process.

Parameters

input_keys (List[str]) – The input keys the registered DataReaders should read from.
dataset_config (DatasetConfig) – The specification of the output dataset.

Returns

xr.Dataset – The retrieved dataset.

class tsdat.FailPipeline[source]

Raises a DataQualityError, halting the pipeline, if the data quality are sufficiently bad. This usually indicates that a manual inspection of the data is recommended.

Raises: DataQualityError – DataQualityError

class Parameters

Bases: pydantic.BaseModel

context :str =: Additional context set by users that ends up in the traceback message.

tolerance :float = 0: Tolerance for the number of allowable failures as the ratio of allowable failures to the total number of values checked. Defaults to 0, meaning that any failed checks will result in a DataQualityError being raised.

parameters :FailPipeline.Parameters

Class Methods

Takes some action on data that has had quality issues identified.

Method Descriptions

run(self, dataset: xarray.Dataset, variable_name: str, failures: numpy.typing.NDArray[numpy.bool8])

Takes some action on data that has had quality issues identified.

Handles the quality of a variable in the dataset and returns the dataset after any corrections have been applied.

Parameters

dataset (xr.Dataset) – The dataset containing the variable to handle.
variable_name (str) – The name of the variable whose quality should be handled.
failures (NDArray[np.bool8]) – The results of the QualityChecker for the provided variable, where True values indicate a quality problem.

Returns

xr.Dataset – The dataset after the QualityHandler has been run.

class tsdat.FileHandler[source]

Bases: DataHandler

DataHandler specifically tailored to reading and writing files of a specific type.

Parameters

reader (DataReader) – The DataReader subclass responsible for reading input data.
writer (FileWriter) – The FileWriter subclass responsible for writing output
data. –

reader :DataReader

writer :FileWriter

class tsdat.FileSystem[source]

Bases: tsdat.io.base.Storage

Handles data storage and retrieval for file-based data formats.

Formats that write to directories (such as zarr) are not supported by the FileSystem storage class.

Parameters

parameters (Parameters) – File-system specific parameters, such as the root path to where files should be saved, or additional keyword arguments to specific functions used by the storage API. See the FileSystemStorage.Parameters class for more details.
handler (FileHandler) – The FileHandler class that should be used to handle data I/O within the storage API.

class Parameters

Bases: pydantic.BaseSettings

file_timespan :Optional[str]

merge_fetched_data_kwargs :Dict[str, Any]

storage_root :pathlib.Path: The path on disk where data and ancillary files will be saved to. Defaults to the storage/root folder in the active working directory. The directory is created as this parameter is set, if the directory does not already exist.

handler :tsdat.io.handlers.FileHandler

parameters :FileSystem.Parameters

Class Methods

`fetch_data`	Fetches data for a given datastream between a specified time range.
`save_ancillary_file`	Saves an ancillary filepath to the datastream's ancillary storage area.
`save_data`	Saves a dataset to the storage area.

Method Descriptions

fetch_data(self, start: datetime.datetime, end: datetime.datetime, datastream: str) → xarray.Dataset

Fetches data for a given datastream between a specified time range.

Note: this method is not smart; it searches for the appropriate data files using their filenames and does not filter within each data file.

Parameters

start (datetime) – The minimum datetime to fetch.
end (datetime) – The maximum datetime to fetch.
datastream (str) – The datastream id to search for.

Returns

xr.Dataset – A dataset containing all the data in the storage area that spans the specified datetimes.

save_ancillary_file(self, filepath: pathlib.Path, datastream: str)

Saves an ancillary filepath to the datastream’s ancillary storage area.

Parameters

filepath (Path) – The path to the ancillary file.
datastream (str) – The datastream that the file is related to.

save_data(self, dataset: xarray.Dataset)

Saves a dataset to the storage area.

At a minimum, the dataset must have a ‘datastream’ global attribute and must have a ‘time’ variable with a np.datetime64-like data type.

Parameters: dataset (xr.Dataset) – The dataset to save.

class tsdat.FileWriter[source]

Bases: DataWriter, abc.ABC

Base class for file-based DataWriters.

Parameters: file_extension (str) – The file extension that the FileHandler should be used for, e.g., “.nc”, “.csv”, …

file_extension :str

Class Methods

Writes the dataset to the provided filepath.

Method Descriptions

abstract write(self, dataset: xarray.Dataset, filepath: Optional[pathlib.Path] = None, **kwargs: Any) → None

Writes the dataset to the provided filepath.

This method is typically called by the tsdat storage API, which will be responsible for providing the filepath, including the file extension.

Parameters

dataset (xr.Dataset) – The dataset to save.
filepath (Optional[Path]) – The path to the file to save.

class tsdat.IngestPipeline[source]

Bases: tsdat.pipeline.base.Pipeline

Pipeline class designed to read in raw, unstandardized time series data and enhance its quality and usability by converting it into a standard format, embedding metadata, applying quality checks and controls, generating reference plots, and saving the data in an accessible format so it can be used later in scientific analyses or in higher-level tsdat Pipelines.

Class Methods

`hook_customize_dataset`	Code hook to customize the retrieved dataset prior to qc being applied.
`hook_finalize_dataset`	Code hook to finalize the dataset after qc is applied but before it is saved.
`hook_plot_dataset`	Code hook to create plots for the data which runs after the dataset has been saved.
`run`	Runs the data pipeline on the provided inputs.

Method Descriptions

hook_customize_dataset(self, dataset: xarray.Dataset) → xarray.Dataset

Code hook to customize the retrieved dataset prior to qc being applied.

Parameters: dataset (xr.Dataset) – The output dataset structure returned by the retriever API.
Returns: xr.Dataset – The customized dataset.

hook_finalize_dataset(self, dataset: xarray.Dataset) → xarray.Dataset

Code hook to finalize the dataset after qc is applied but before it is saved.

Parameters: dataset (xr.Dataset) – The output dataset returned by the retriever API and modified by the hook_customize_dataset user code hook.
Returns: xr.Dataset – The finalized dataset, ready to be saved.

hook_plot_dataset(self, dataset: xarray.Dataset)

Code hook to create plots for the data which runs after the dataset has been saved.

Parameters: dataset (xr.Dataset) – The dataset to plot.

run(self, inputs: List[str], **kwargs: Any) → xarray.Dataset

Runs the data pipeline on the provided inputs.

Parameters

inputs (List[str]) – A list of input keys that the pipeline’s Retriever class
pipeline. (can use to load data into the) –

Returns

xr.Dataset – The processed dataset.

class tsdat.NetCDFHandler[source]

Bases: tsdat.io.base.FileHandler

DataHandler specifically tailored to reading and writing files of a specific type.

Parameters

reader (DataReader) – The DataReader subclass responsible for reading input data.
writer (FileWriter) – The FileWriter subclass responsible for writing output
data. –

extension :str = nc

reader :tsdat.io.readers.NetCDFReader

writer :tsdat.io.writers.NetCDFWriter

class tsdat.NetCDFReader[source]

Bases: tsdat.io.base.DataReader

Thin wrapper around xarray’s open_dataset() function, with optional parameters used as keyword arguments in the function call.

parameters :Dict[str, Any]

Class Methods

read

Reads data given an input key.

Method Descriptions

read(self, input_key: str) → xarray.Dataset

Reads data given an input key.

Uses the input key to open a resource and load data as a xr.Dataset object or as a mapping of strings to xr.Dataset objects.

In most cases DataReaders will only need to return a single xr.Dataset, but occasionally some types of inputs necessitate that the data loaded from the input_key be returned as a mapping. For example, if the input_key is a path to a zip file containing multiple disparate datasets, then returning a mapping is appropriate.

Parameters

input_key (str) – An input key matching the DataReader’s regex pattern that should be used to load data.

Returns

Union[xr.Dataset, Dict[str, xr.Dataset]] –

The raw data extracted from the: provided input key.

class tsdat.NetCDFWriter[source]

Bases: tsdat.io.base.FileWriter

Thin wrapper around xarray’s Dataset.to_netcdf() function for saving a dataset to a netCDF file. Properties under the to_netcdf_kwargs parameter will be passed to Dataset.to_netcdf() as keyword arguments.

File compression is used by default to save disk space. To disable compression set the use_compression parameter to False.

class Parameters

Bases: pydantic.BaseModel

compression_kwargs :Dict[str, Any]

to_netcdf_kwargs :Dict[str, Any]

use_compression :bool = True

file_extension :str = nc

parameters :NetCDFWriter.Parameters

Class Methods

Writes the dataset to the provided filepath.

Method Descriptions

write(self, dataset: xarray.Dataset, filepath: Optional[pathlib.Path] = None) → None

Writes the dataset to the provided filepath.

This method is typically called by the tsdat storage API, which will be responsible for providing the filepath, including the file extension.

Parameters

dataset (xr.Dataset) – The dataset to save.
filepath (Optional[Path]) – The path to the file to save.

class tsdat.ParameterizedClass[source]

Bases: pydantic.BaseModel

Base class for any class that accepts ‘parameters’ as an argument.

Sets the default ‘parameters’ to {}. Subclasses of ParameterizedClass should override the ‘parameters’ properties to support custom required or optional arguments from configuration files.

parameters :Any

class tsdat.ParameterizedConfigClass[source]

Bases: pydantic.BaseModel

classname :pydantic.StrictStr

parameters :Dict[str, Any]

Class Methods

`classname_looks_like_a_module`
`instantiate`	Instantiates and returns the class specified by the 'classname' parameter.

Method Descriptions

classmethod classname_looks_like_a_module(cls, v: pydantic.StrictStr) → pydantic.StrictStr

instantiate(self) → Any

Instantiates and returns the class specified by the ‘classname’ parameter.

Returns: Any – An instance of the specified class.

class tsdat.Pipeline[source]

Bases: tsdat.utils.ParameterizedClass, abc.ABC

Base class for tsdat data pipelines.

dataset_config :tsdat.config.dataset.DatasetConfig: Describes the structure and metadata of the output dataset.

quality :tsdat.qc.base.QualityManagement: Manages the dataset quality through checks and corrections.

retriever :tsdat.io.base.Retriever: Retrieves data from input keys.

settings :Any

storage :tsdat.io.base.Storage: Stores the dataset so it can be retrieved later.

triggers :List[Pattern] = []: Regex patterns matching input keys to determine when the pipeline should run.

Class Methods

`prepare_retrieved_dataset`	Modifies the retrieved dataset by dropping variables not declared in the
`run`	Runs the data pipeline on the provided inputs.

Method Descriptions

prepare_retrieved_dataset(self, dataset: xarray.Dataset) → xarray.Dataset

Modifies the retrieved dataset by dropping variables not declared in the DatasetConfig, adding static variables, initializing non-retrieved variables, and importing global and variable-level attributes from the DatasetConfig.

Parameters: dataset (xr.Dataset) – The retrieved dataset.
Returns: xr.Dataset – The dataset with structure and metadata matching the DatasetConfig.

abstract run(self, inputs: List[str], **kwargs: Any) → Any

Runs the data pipeline on the provided inputs.

Parameters

inputs (List[str]) – A list of input keys that the pipeline’s Retriever class
pipeline. (can use to load data into the) –

Returns

xr.Dataset – The processed dataset.

class tsdat.PipelineConfig[source]

Bases: tsdat.config.utils.ParameterizedConfigClass, tsdat.config.utils.YamlModel

Contains configuration parameters for tsdat pipelines.

This class is ultimately converted into a tsdat.pipeline.base.Pipeline subclass that will be used to process data.

Provides methods to support yaml parsing and validation, including the generation of json schema for immediate validation. This class also provides a method to instantiate a tsdat.pipeline.base.Pipeline subclass from a parsed configuration file.

Parameters

classname (str) – The dotted module path to the pipeline that the specified configurations should apply to. To use the built-in IngestPipeline, for example, you would set ‘tsdat.pipeline.pipelines.IngestPipeline’ as the classname.
triggers (List[Pattern[str]]) – A list of regex patterns that should trigger this pipeline when matched with an input key.
retriever (Union[Overrideable[RetrieverConfig], RetrieverConfig]) – Either the path to the retriever configuration yaml file and any overrides that should be applied, or the retriever configurations themselves.
dataset (Union[Overrideable[DatasetConfig], DatasetConfig]) – Either the path to the dataset configuration yaml file and any overrides that should be applied, or the dataset configurations themselves.
quality (Union[Overrideable[QualityConfig], QualityConfig]) – Either the path to the quality configuration yaml file and any overrides that should be applied, or the quality configurations themselves.
storage (Union[Overrideable[StorageConfig], StorageConfig]) – Either the path to the storage configuration yaml file and any overrides that should be applied, or the storage configurations themselves.

dataset :Union[tsdat.config.utils.Overrideable[tsdat.config.dataset.DatasetConfig], tsdat.config.dataset.DatasetConfig]

quality :Union[tsdat.config.utils.Overrideable[tsdat.config.quality.QualityConfig], tsdat.config.quality.QualityConfig]

retriever :Union[tsdat.config.utils.Overrideable[tsdat.config.retriever.RetrieverConfig], tsdat.config.retriever.RetrieverConfig]

storage :Union[tsdat.config.utils.Overrideable[tsdat.config.storage.StorageConfig], tsdat.config.storage.StorageConfig]

triggers :List[Pattern]

Class Methods

`instantiate_pipeline`	Loads the tsdat.pipeline.BasePipeline subclass specified by the classname property.
`merge_overrideable_yaml`

Method Descriptions

instantiate_pipeline(self) → tsdat.pipeline.base.Pipeline

Loads the tsdat.pipeline.BasePipeline subclass specified by the classname property.

Properties and sub-properties of the PipelineConfig class that are subclasses of tsdat.config.utils.ParameterizedConfigClass (e.g, classes that define a ‘classname’ and optional ‘parameters’ properties) will also be instantiated in similar fashion. See tsdat.config.utils.recursive_instantiate for implementation details.

Returns: Pipeline – An instance of a tsdat.pipeline.base.Pipeline subclass.

classmethod merge_overrideable_yaml(cls, v: Dict[str, Any], values: Dict[str, Any], field: pydantic.fields.ModelField)

class tsdat.QualityChecker[source]

Bases: tsdat.utils.ParameterizedClass, abc.ABC

Base class for code that checks the dataset / data variable quality.

Class Methods

validate_manager_names_are_unique

Identifies and flags quality problems with the data.

Method Descriptions

abstract run(self, dataset: xarray.Dataset, variable_name: str) → numpy.typing.NDArray[numpy.bool8]

Identifies and flags quality problems with the data.

Checks the quality of a specific variable in the dataset and returns the results of the check as a boolean array where True values represent quality problems and False values represent data that passes the quality check.

QualityCheckers should not modify dataset variables; changes to the dataset should be made by QualityHandler(s), which receive the results of a QualityChecker as input.

Parameters

dataset (xr.Dataset) – The dataset containing the variable to check.
variable_name (str) – The name of the variable to check.

Returns

NDArray[np.bool8] – The results of the quality check, where True values indicate a quality problem.

class tsdat.QualityConfig[source]

Bases: tsdat.config.utils.YamlModel

Contains quality configuration parameters for tsdat pipelines.

This class will ultimately be converted into a tsdat.qc.base.QualityManagement class for use in downstream tsdat pipeline code.

Provides methods to support yaml parsing and validation, including the generation of json schema for immediate validation.

Parameters: managers (List[ManagerConfig]) – A list of quality checks and controls that should be applied.

managers :List[ManagerConfig]

Class Methods

Method Descriptions

classmethod validate_manager_names_are_unique(cls, v: List[ManagerConfig]) → List[ManagerConfig]

class tsdat.QualityHandler[source]

Bases: tsdat.utils.ParameterizedClass, abc.ABC

Base class for code that handles the dataset / data variable quality.

Class Methods

Takes some action on data that has had quality issues identified.

Method Descriptions

abstract run(self, dataset: xarray.Dataset, variable_name: str, failures: numpy.typing.NDArray[numpy.bool8]) → xarray.Dataset

Takes some action on data that has had quality issues identified.

Handles the quality of a variable in the dataset and returns the dataset after any corrections have been applied.

Parameters

dataset (xr.Dataset) – The dataset containing the variable to handle.
variable_name (str) – The name of the variable whose quality should be handled.
failures (NDArray[np.bool8]) – The results of the QualityChecker for the provided variable, where True values indicate a quality problem.

Returns

xr.Dataset – The dataset after the QualityHandler has been run.

class tsdat.QualityManagement[source]

Bases: pydantic.BaseModel

Main class for orchestrating the dispatch of QualityCheckers and QualityHandlers.

Parameters: managers (List[QualityManager]) – The list of QualityManagers that should be run.

managers :List[QualityManager]

Class Methods

manage

Runs the registered QualityManagers on the dataset.

Method Descriptions

manage(self, dataset: xarray.Dataset) → xarray.Dataset

Runs the registered QualityManagers on the dataset.

Parameters: dataset (xr.Dataset) – The dataset to apply quality checks and controls to.
Returns: xr.Dataset – The quality-checked dataset.

class tsdat.QualityManager[source]

Bases: pydantic.BaseModel

Groups a QualityChecker and one or more QualityHandlers together.

Parameters

name (str) – The name of the quality manager.
checker (QualityChecker) – The quality check that should be run.
handlers (QualityHandler) – One or more QualityHandlers that should be run given the results of the checker.
apply_to (List[str]) – A list of variables that the check should run for. Accepts keywords of ‘COORDS’ or ‘DATA_VARS’, or any number of specific variables that should be run.
exclude (List[str]) – A list of variables that the check should exclude. Accepts the same keywords as apply_to.

apply_to :List[str]

checker :QualityChecker

exclude :List[str] = []

handlers :List[QualityHandler]

name :str

Class Methods

Runs the quality manager on the dataset.

Method Descriptions

run(self, dataset: xarray.Dataset) → xarray.Dataset

Runs the quality manager on the dataset.

Parameters: dataset (xr.Dataset) – The dataset to apply quality checks / controls to.
Returns: xr.Dataset – The dataset after the quality check and controls have been applied.

class tsdat.RecordQualityResults[source]

Records the results of the quality check in an ancillary qc variable. Creates the ancillary qc variable if one does not already exist.

class Parameters

Bases: pydantic.BaseModel

assessment :Literal[bad, indeterminate]: Indicates the quality of the data if the test results indicate a failure.

bit :int: The bit number (e.g., 1, 2, 3, …) used to indicate if the check passed. The quality results are bitpacked into an integer array to preserve space. For example, if ‘check #0’ uses bit 0 and fails, and ‘check #1’ uses bit 1 and fails then the resulting value on the qc variable would be 2^(0) + 2^(1) = 3. If we had a third check it would be 2^(0) + 2^(1) + 2^(2) = 7.

meaning :str: A string that describes the test applied.

parameters :RecordQualityResults.Parameters

Class Methods

Takes some action on data that has had quality issues identified.

Method Descriptions

run(self, dataset: xarray.Dataset, variable_name: str, failures: numpy.typing.NDArray[numpy.bool8]) → xarray.Dataset

Takes some action on data that has had quality issues identified.

Handles the quality of a variable in the dataset and returns the dataset after any corrections have been applied.

Parameters

dataset (xr.Dataset) – The dataset containing the variable to handle.
variable_name (str) – The name of the variable whose quality should be handled.
failures (NDArray[np.bool8]) – The results of the QualityChecker for the provided variable, where True values indicate a quality problem.

Returns

xr.Dataset – The dataset after the QualityHandler has been run.

class tsdat.ReplaceFailedValues[source]

Replaces all failed values with the variable’s _FillValue. If the variable does not have a _FillValue attribute then nan is used instead

Class Methods

coerce_to_patterned_retriever

Takes some action on data that has had quality issues identified.

Method Descriptions

run(self, dataset: xarray.Dataset, variable_name: str, failures: numpy.typing.NDArray[numpy.bool8]) → xarray.Dataset

Takes some action on data that has had quality issues identified.

Handles the quality of a variable in the dataset and returns the dataset after any corrections have been applied.

Parameters

dataset (xr.Dataset) – The dataset containing the variable to handle.
variable_name (str) – The name of the variable whose quality should be handled.
failures (NDArray[np.bool8]) – The results of the QualityChecker for the provided variable, where True values indicate a quality problem.

Returns

xr.Dataset – The dataset after the QualityHandler has been run.

class tsdat.Retriever[source]

Bases: tsdat.utils.ParameterizedClass, abc.ABC

Base class for retrieving data used as input to tsdat pipelines.

Parameters: readers (Dict[str, DataReader]) – The mapping of readers that should be used to retrieve data given input_keys and optional keyword arguments provided by subclasses of Retriever.

readers :Dict[Pattern, Any]: Mapping of readers that should be used to read data given input keys.

Class Methods

retrieve

Prepares the raw dataset mapping for use in downstream pipeline processes.

Method Descriptions

abstract retrieve(self, input_keys: List[str], dataset_config: tsdat.config.dataset.DatasetConfig, **kwargs: Any) → xarray.Dataset

Prepares the raw dataset mapping for use in downstream pipeline processes.

This is done by consolidating the data into a single xr.Dataset object. The retrieved dataset may contain additional coords and data_vars that are not defined in the output dataset. Input data converters are applied as part of the preparation process.

Parameters

input_keys (List[str]) – The input keys the registered DataReaders should read from.
dataset_config (DatasetConfig) – The specification of the output dataset.

Returns

xr.Dataset – The retrieved dataset.

class tsdat.RetrieverConfig[source]

Bases: tsdat.config.utils.ParameterizedConfigClass, tsdat.config.utils.YamlModel

Contains configuration parameters for the tsdat retriever class.

This class will ultimately be converted into a tsdat.io.base.Retriever subclass for use in tsdat pipelines.

Provides methods to support yaml parsing and validation, including the generation of json schema for immediate validation. This class also provides a method to instantiate a tsdat.io.base.Retriever subclass from a parsed configuration file.

Parameters

classname (str) – The dotted module path to the pipeline that the specified configurations should apply to. To use the built-in IngestPipeline, for example, you would set ‘tsdat.pipeline.pipelines.IngestPipeline’ as the classname.
readers (Dict[str, DataReaderConfig]) – The DataReaders to use for reading input data.

coords :Dict[str, Union[Dict[Pattern, RetrievedVariableConfig], RetrievedVariableConfig]]

data_vars :Dict[str, Union[Dict[Pattern, RetrievedVariableConfig], RetrievedVariableConfig]]

readers :Dict[Pattern, DataReaderConfig]

Class Methods

Method Descriptions

classmethod coerce_to_patterned_retriever(cls, var_dict: Dict[str, Union[Dict[Pattern, RetrievedVariableConfig], RetrievedVariableConfig]]) → Dict[str, Dict[Pattern[str], RetrievedVariableConfig]]

class tsdat.SortDatasetByCoordinate[source]

Sorts the dataset by the failed variable, if there are any failures.

class Parameters

Bases: pydantic.BaseModel

ascending :bool = True: Whether to sort the dataset in ascending order. Defaults to True.

parameters :SortDatasetByCoordinate.Parameters

Class Methods