tsdat.qc

The tsdat.qc package provides the classes that the data pipeline uses to manage quality control/quality assurance for the dataset. This includes the infrastructure to run quality tests and handle failures, as well specific checkers and handlers that can be specified in the pipeline config file.

We warmly welcome community contribututions to increase this default list.

Submodules

Classes

CheckFailMax

Check that no values for the specified variable greater less than

CheckFailMin

Check that no values for the specified variable are less than

CheckMax

Check that no values for the specified variable are greater than

CheckMin

Check that no values for the specified variable are less than

CheckMissing

Checks if any values are assigned to _FillValue or ‘NaN’ (for non-time

CheckMonotonic

Checks that all values for the specified variable are either

CheckValidDelta

Check that the difference between any two consecutive

CheckValidMax

Check that no values for the specified variable are greater than

CheckValidMin

Check that no values for the specified variable are less than

CheckWarnMax

Check that no values for the specified variable are greater than

CheckWarnMin

Check that no values for the specified variable are less than

FailPipeline

Throw an exception, halting the pipeline & indicating a critical error

QCParamKeys

Symbolic constants used for referencing QC-related

QualityChecker

Class containing the code to perform a single Quality Check on a

QualityHandler

Class containing code to be executed if a particular quality check fails.

QualityManagement

Class that provides static helper functions for providing quality

QualityManager

Applies a single Quality Manager to the given Dataset, as defined by

RecordQualityResults

Record the results of the quality check in an ancillary qc variable.

RemoveFailedValues

Replace all the failed values with _FillValue

SendEmailAWS

Send an email to the recipients using AWS services.

class tsdat.qc.CheckFailMax(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)[source]

Bases: CheckMax

Check that no values for the specified variable greater less than the maximum vaue set by the ‘fail_range’ attribute. If the variable in question does not posess the ‘fail_range’ attribute, this check will be skipped.

class tsdat.qc.CheckFailMin(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)[source]

Bases: CheckMin

Check that no values for the specified variable are less than the minimum vaue set by the ‘fail_range’ attribute. If the variable in question does not posess the ‘fail_range’ attribute, this check will be skipped.

class tsdat.qc.CheckMax(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]

Bases: QualityChecker

Check that no values for the specified variable are greater than a specified maximum threshold. The threshold value is an attribute set on the variable in question. The attribute name is specified in the quality checker definition in the pipeline config file by setting a param called ‘key: ATTRIBUTE_NAME’.

If the key parameter is not set or the variable does not possess the specified attribute, this check will be skipped.

Class Methods

run

Check a dataset’s variable to see if it passes a quality check.

Method Descriptions

run(self, variable_name: str)Optional[numpy.ndarray]

Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.

Parameters

variable_name (str) – The name of the variable to check

Returns

If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.

Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.

If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.

Return type

Optional[np.ndarray]

class tsdat.qc.CheckMin(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]

Bases: QualityChecker

Check that no values for the specified variable are less than a specified minimum threshold. The threshold value is an attribute set on the variable in question. The attribute name is specified in the quality checker definition in the pipeline config file by setting a param called ‘key: ATTRIBUTE_NAME’.

If the key parameter is not set or the variable does not possess the specified attribute, this check will be skipped.

Class Methods

run

Check a dataset’s variable to see if it passes a quality check.

Method Descriptions

run(self, variable_name: str)Optional[numpy.ndarray]

Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.

Parameters

variable_name (str) – The name of the variable to check

Returns

If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.

Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.

If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.

Return type

Optional[np.ndarray]

class tsdat.qc.CheckMissing(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]

Bases: QualityChecker

Checks if any values are assigned to _FillValue or ‘NaN’ (for non-time variables) or checks if values are assigned to ‘NaT’ (for time variables). Also, for non-time variables, checks if values are above or below valid_range, as this is considered missing as well.

Class Methods

run

Check a dataset’s variable to see if it passes a quality check.

Method Descriptions

run(self, variable_name: str)Optional[numpy.ndarray]

Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.

Parameters

variable_name (str) – The name of the variable to check

Returns

If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.

Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.

If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.

Return type

Optional[np.ndarray]

class tsdat.qc.CheckMonotonic(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]

Bases: QualityChecker

Checks that all values for the specified variable are either strictly increasing or strictly decreasing.

Class Methods

run

Check a dataset’s variable to see if it passes a quality check.

Method Descriptions

run(self, variable_name: str)Optional[numpy.ndarray]

Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.

Parameters

variable_name (str) – The name of the variable to check

Returns

If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.

Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.

If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.

Return type

Optional[np.ndarray]

class tsdat.qc.CheckValidDelta(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]

Bases: QualityChecker

Check that the difference between any two consecutive values is not greater than the threshold set by the ‘valid_delta’ attribute. If the variable in question does not posess the ‘valid_delta’ attribute, this check will be skipped.

Class Methods

run

Check a dataset’s variable to see if it passes a quality check.

Method Descriptions

run(self, variable_name: str)Optional[numpy.ndarray]

Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.

Parameters

variable_name (str) – The name of the variable to check

Returns

If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.

Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.

If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.

Return type

Optional[np.ndarray]

class tsdat.qc.CheckValidMax(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)[source]

Bases: CheckMax

Check that no values for the specified variable are greater than the maximum vaue set by the ‘valid_range’ attribute. If the variable in question does not posess the ‘valid_range’ attribute, this check will be skipped.

class tsdat.qc.CheckValidMin(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)[source]

Bases: CheckMin

Check that no values for the specified variable are less than the minimum vaue set by the ‘valid_range’ attribute. If the variable in question does not posess the ‘valid_range’ attribute, this check will be skipped.

class tsdat.qc.CheckWarnMax(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)[source]

Bases: CheckMax

Check that no values for the specified variable are greater than the maximum vaue set by the ‘warn_range’ attribute. If the variable in question does not posess the ‘warn_range’ attribute, this check will be skipped.

class tsdat.qc.CheckWarnMin(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)[source]

Bases: CheckMin

Check that no values for the specified variable are less than the minimum vaue set by the ‘warn_range’ attribute. If the variable in question does not posess the ‘warn_range’ attribute, this check will be skipped.

class tsdat.qc.FailPipeline(ds: xarray.Dataset, previous_data: xarray.Dataset, quality_manager: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]

Bases: QualityHandler

Throw an exception, halting the pipeline & indicating a critical error

Class Methods

run

Perform a follow-on action if a quality check fails. This can be used

Method Descriptions

run(self, variable_name: str, results_array: numpy.ndarray)

Perform a follow-on action if a quality check fails. This can be used to correct data if needed (such as replacing a bad value with missing value, emailing a contact persion, or raising an exception if the failure constitutes a critical error).

Parameters
  • variable_name (str) – Name of the variable that failed

  • results_array (np.ndarray) – An array of True/False values for each data value of the variable. True means the check failed.

class tsdat.qc.QCParamKeys[source]

Symbolic constants used for referencing QC-related fields in the pipeline config file

ASSESSMENT = assessment
CORRECTION = correction
QC_BIT = bit
TEST_MEANING = meaning
class tsdat.qc.QualityChecker(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]

Bases: abc.ABC

Class containing the code to perform a single Quality Check on a Dataset variable.

Parameters
  • ds (xr.Dataset) – The dataset the checker will be applied to

  • previous_data (xr.Dataset) – A dataset from the previous processing interval (i.e., file). This is used to check for consistency between files, such as for monitonic or delta checks when we need to check the previous value.

  • definition (QualityManagerDefinition) – The quality manager definition as specified in the pipeline config file

  • parameters (dict, optional) – A dictionary of checker-specific parameters specified in the pipeline config file. Defaults to {}

Class Methods

run

Check a dataset’s variable to see if it passes a quality check.

Method Descriptions

abstract run(self, variable_name: str)Optional[numpy.ndarray]

Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.

Parameters

variable_name (str) – The name of the variable to check

Returns

If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.

Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.

If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.

Return type

Optional[np.ndarray]

class tsdat.qc.QualityHandler(ds: xarray.Dataset, previous_data: xarray.Dataset, quality_manager: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]

Bases: abc.ABC

Class containing code to be executed if a particular quality check fails.

Parameters
  • ds (xr.Dataset) – The dataset the handler will be applied to

  • previous_data (xr.Dataset) – A dataset from the previous processing interval (i.e., file). This is used to check for consistency between files, such as for monotonic or delta checks when we need to check the previous value.

  • quality_manager (QualityManagerDefinition) – The quality_manager definition as specified in the pipeline config file

  • parameters (dict, optional) – A dictionary of handler-specific parameters specified in the pipeline config file. Defaults to {}

Class Methods

record_correction

If a correction was made to variable data to fix invalid values

run

Perform a follow-on action if a quality check fails. This can be used

Method Descriptions

record_correction(self, variable_name: str)

If a correction was made to variable data to fix invalid values as detected by a quality check, this method will record the fix to the appropriate variable attribute. The correction description will come from the handler params which get set in the pipeline config file.

Parameters

variable_name (str) – Name

abstract run(self, variable_name: str, results_array: numpy.ndarray)

Perform a follow-on action if a quality check fails. This can be used to correct data if needed (such as replacing a bad value with missing value, emailing a contact persion, or raising an exception if the failure constitutes a critical error).

Parameters
  • variable_name (str) – Name of the variable that failed

  • results_array (np.ndarray) – An array of True/False values for each data value of the variable. True means the check failed.

class tsdat.qc.QualityManagement[source]

Class that provides static helper functions for providing quality control checks on a tsdat-standardized xarray dataset.

Class Methods

run

Applies the Quality Managers defined in the given Config to this dataset.

Method Descriptions

static run(ds: xarray.Dataset, config: tsdat.config.Config, previous_data: xarray.Dataset)xarray.Dataset

Applies the Quality Managers defined in the given Config to this dataset. QC results will be embedded in the dataset. QC metadata will be stored as attributes, and QC flags will be stored as a bitwise integer in new companion qc_ variables that are added to the dataset. This method will create QC companion variables if they don’t exist.

Parameters
  • ds (xr.Dataset) – The dataset to apply quality managers to

  • config (Config) – A configuration definition (loaded from yaml)

  • previous_data (xr.Dataset) – A dataset from the previous processing interval (i.e., file). This is used to check for consistency between files, such as for monitonic or delta checks when we need to check the previous value.

Returns

The dataset after quality checkers and handlers have been applied.

Return type

xr.Dataset

class tsdat.qc.QualityManager(ds: xarray.Dataset, config: tsdat.config.Config, definition: tsdat.config.QualityManagerDefinition, previous_data: xarray.Dataset)[source]

Applies a single Quality Manager to the given Dataset, as defined by the Config

Parameters
  • ds (xr.Dataset) – The dataset for which we will perform quality management.

  • config (Config) – The Config from the pipeline definition file.

  • definition (QualityManagerDefinition) – Definition of the quality test this class manages.

  • previous_data (xr.Dataset) – A dataset from the previous processing interval (i.e., file). This is used to check for consistency between files, such as for monitonic or delta checks when we need to check the previous value.

Class Methods

run

Runs the QualityChecker and QualityHandler(s) for each specified

Method Descriptions

run(self)xarray.Dataset

Runs the QualityChecker and QualityHandler(s) for each specified variable as defined in the config file.

Returns

The dataset after the quality checker and the quality handlers have been run.

Raises

QCError – A QCError indicates that a fatal error has occurred.

Return type

xr.Dataset

class tsdat.qc.RecordQualityResults(ds: xarray.Dataset, previous_data: xarray.Dataset, quality_manager: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]

Bases: QualityHandler

Record the results of the quality check in an ancillary qc variable.

Class Methods

run

Perform a follow-on action if a quality check fails. This can be used

Method Descriptions

run(self, variable_name: str, results_array: numpy.ndarray)

Perform a follow-on action if a quality check fails. This can be used to correct data if needed (such as replacing a bad value with missing value, emailing a contact persion, or raising an exception if the failure constitutes a critical error).

Parameters
  • variable_name (str) – Name of the variable that failed

  • results_array (np.ndarray) – An array of True/False values for each data value of the variable. True means the check failed.

class tsdat.qc.RemoveFailedValues(ds: xarray.Dataset, previous_data: xarray.Dataset, quality_manager: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]

Bases: QualityHandler

Replace all the failed values with _FillValue

Class Methods

run

Perform a follow-on action if a quality check fails. This can be used

Method Descriptions

run(self, variable_name: str, results_array: numpy.ndarray)

Perform a follow-on action if a quality check fails. This can be used to correct data if needed (such as replacing a bad value with missing value, emailing a contact persion, or raising an exception if the failure constitutes a critical error).

Parameters
  • variable_name (str) – Name of the variable that failed

  • results_array (np.ndarray) – An array of True/False values for each data value of the variable. True means the check failed.

class tsdat.qc.SendEmailAWS(ds: xarray.Dataset, previous_data: xarray.Dataset, quality_manager: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]

Bases: QualityHandler

Send an email to the recipients using AWS services.

Class Methods

run

Perform a follow-on action if a quality check fails. This can be used

Method Descriptions

run(self, variable_name: str, results_array: numpy.ndarray)

Perform a follow-on action if a quality check fails. This can be used to correct data if needed (such as replacing a bad value with missing value, emailing a contact persion, or raising an exception if the failure constitutes a critical error).

Parameters
  • variable_name (str) – Name of the variable that failed

  • results_array (np.ndarray) – An array of True/False values for each data value of the variable. True means the check failed.