tsdat.qc

The tsdat.qc package provides the classes that the data pipeline uses to manage quality control/quality assurance for the dataset. This includes the infrastructure to run quality tests and handle failures, as well specific checkers and handlers that can be specified in the pipeline config file.

We warmly welcome community contribututions to increase this default list.

Package Contents

Classes

QualityManagement

Class that provides static helper functions for providing quality

QualityManager

Applies a single Quality Manager to the given Dataset, as defined by

QualityChecker

Class containing the code to perform a single Quality Check on a

CheckWarnMax

Check that no values for the specified variable are greater than

CheckFailMax

Check that no values for the specified variable greater less than

CheckFailMin

Check that no values for the specified variable are less than

CheckMax

Check that no values for the specified variable are greater than

CheckMin

Check that no values for the specified variable are less than

CheckMissing

Checks if any values are assigned to _FillValue or ‘NaN’ (for non-time

CheckMonotonic

Checks that all values for the specified variable are either

CheckValidDelta

Check that the difference between any two consecutive

CheckValidMax

Check that no values for the specified variable are greater than

CheckValidMin

Check that no values for the specified variable are less than

CheckWarnMin

Check that no values for the specified variable are less than

QualityHandler

Class containing code to be executed if a particular quality check fails.

QCParamKeys

Symbolic constants used for referencing QC-related

FailPipeline

Throw an exception, halting the pipeline & indicating a critical error

RecordQualityResults

Record the results of the quality check in an ancillary qc variable.

RemoveFailedValues

Replace all the failed values with _FillValue

SendEmailAWS

Send an email to the recipients using AWS services.

class tsdat.qc.QualityManagement

Class that provides static helper functions for providing quality control checks on a tsdat-standardized xarray dataset.

static run(ds: xarray.Dataset, config: tsdat.config.Config, previous_data: xarray.Dataset)xarray.Dataset

Applies the Quality Managers defined in the given Config to this dataset. QC results will be embedded in the dataset. QC metadata will be stored as attributes, and QC flags will be stored as a bitwise integer in new companion qc_ variables that are added to the dataset. This method will create QC companion variables if they don’t exist.

Parameters
  • ds (xr.Dataset) – The dataset to apply quality managers to

  • config (Config) – A configuration definition (loaded from yaml)

  • previous_data (xr.Dataset) – A dataset from the previous processing interval (i.e., file). This is used to check for consistency between files, such as for monitonic or delta checks when we need to check the previous value.

Returns

The dataset after quality checkers and handlers have been applied.

Return type

xr.Dataset

class tsdat.qc.QualityManager(ds: xarray.Dataset, config: tsdat.config.Config, definition: tsdat.config.QualityManagerDefinition, previous_data: xarray.Dataset)

Applies a single Quality Manager to the given Dataset, as defined by the Config

Parameters
  • ds (xr.Dataset) – The dataset for which we will perform quality management.

  • config (Config) – The Config from the pipeline definition file.

  • definition (QualityManagerDefinition) – Definition of the quality test this class manages.

  • previous_data (xr.Dataset) – A dataset from the previous processing interval (i.e., file). This is used to check for consistency between files, such as for monitonic or delta checks when we need to check the previous value.

run(self)xarray.Dataset

Runs the QualityChecker and QualityHandler(s) for each specified variable as defined in the config file.

Returns

The dataset after the quality checker and the quality handlers have been run.

Raises

QCError – A QCError indicates that a fatal error has occurred.

Return type

xr.Dataset

class tsdat.qc.QualityChecker(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)

Bases: abc.ABC

Class containing the code to perform a single Quality Check on a Dataset variable.

Parameters
  • ds (xr.Dataset) – The dataset the checker will be applied to

  • previous_data (xr.Dataset) – A dataset from the previous processing interval (i.e., file). This is used to check for consistency between files, such as for monitonic or delta checks when we need to check the previous value.

  • definition (QualityManagerDefinition) – The quality manager definition as specified in the pipeline config file

  • parameters (dict, optional) – A dictionary of checker-specific parameters specified in the pipeline config file. Defaults to {}

abstract run(self, variable_name: str)Optional[numpy.ndarray]

Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.

Parameters

variable_name (str) – The name of the variable to check

Returns

If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.

Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.

If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.

Return type

Optional[np.ndarray]

class tsdat.qc.CheckWarnMax(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)

Bases: CheckMax

Check that no values for the specified variable are greater than the maximum vaue set by the ‘warn_range’ attribute. If the variable in question does not posess the ‘warn_range’ attribute, this check will be skipped.

class tsdat.qc.CheckFailMax(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)

Bases: CheckMax

Check that no values for the specified variable greater less than the maximum vaue set by the ‘fail_range’ attribute. If the variable in question does not posess the ‘fail_range’ attribute, this check will be skipped.

class tsdat.qc.CheckFailMin(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)

Bases: CheckMin

Check that no values for the specified variable are less than the minimum vaue set by the ‘fail_range’ attribute. If the variable in question does not posess the ‘fail_range’ attribute, this check will be skipped.

class tsdat.qc.CheckMax(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)

Bases: QualityChecker

Check that no values for the specified variable are greater than a specified maximum threshold. The threshold value is an attribute set on the variable in question. The attribute name is specified in the quality checker definition in the pipeline config file by setting a param called ‘key: ATTRIBUTE_NAME’.

If the key parameter is not set or the variable does not possess the specified attribute, this check will be skipped.

run(self, variable_name: str)Optional[numpy.ndarray]

Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.

Parameters

variable_name (str) – The name of the variable to check

Returns

If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.

Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.

If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.

Return type

Optional[np.ndarray]

class tsdat.qc.CheckMin(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)

Bases: QualityChecker

Check that no values for the specified variable are less than a specified minimum threshold. The threshold value is an attribute set on the variable in question. The attribute name is specified in the quality checker definition in the pipeline config file by setting a param called ‘key: ATTRIBUTE_NAME’.

If the key parameter is not set or the variable does not possess the specified attribute, this check will be skipped.

run(self, variable_name: str)Optional[numpy.ndarray]

Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.

Parameters

variable_name (str) – The name of the variable to check

Returns

If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.

Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.

If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.

Return type

Optional[np.ndarray]

class tsdat.qc.CheckMissing(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)

Bases: QualityChecker

Checks if any values are assigned to _FillValue or ‘NaN’ (for non-time variables) or checks if values are assigned to ‘NaT’ (for time variables). Also, for non-time variables, checks if values are above or below valid_range, as this is considered missing as well.

run(self, variable_name: str)Optional[numpy.ndarray]

Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.

Parameters

variable_name (str) – The name of the variable to check

Returns

If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.

Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.

If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.

Return type

Optional[np.ndarray]

class tsdat.qc.CheckMonotonic(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)

Bases: QualityChecker

Checks that all values for the specified variable are either strictly increasing or strictly decreasing.

run(self, variable_name: str)Optional[numpy.ndarray]

Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.

Parameters

variable_name (str) – The name of the variable to check

Returns

If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.

Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.

If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.

Return type

Optional[np.ndarray]

class tsdat.qc.CheckValidDelta(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)

Bases: QualityChecker

Check that the difference between any two consecutive values is not greater than the threshold set by the ‘valid_delta’ attribute. If the variable in question does not posess the ‘valid_delta’ attribute, this check will be skipped.

run(self, variable_name: str)Optional[numpy.ndarray]

Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.

Parameters

variable_name (str) – The name of the variable to check

Returns

If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.

Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.

If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.

Return type

Optional[np.ndarray]

class tsdat.qc.CheckValidMax(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)

Bases: CheckMax

Check that no values for the specified variable are greater than the maximum vaue set by the ‘valid_range’ attribute. If the variable in question does not posess the ‘valid_range’ attribute, this check will be skipped.

class tsdat.qc.CheckValidMin(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)

Bases: CheckMin

Check that no values for the specified variable are less than the minimum vaue set by the ‘valid_range’ attribute. If the variable in question does not posess the ‘valid_range’ attribute, this check will be skipped.

class tsdat.qc.CheckWarnMin(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)

Bases: CheckMin

Check that no values for the specified variable are less than the minimum vaue set by the ‘warn_range’ attribute. If the variable in question does not posess the ‘warn_range’ attribute, this check will be skipped.

class tsdat.qc.QualityHandler(ds: xarray.Dataset, previous_data: xarray.Dataset, quality_manager: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)

Bases: abc.ABC

Class containing code to be executed if a particular quality check fails.

Parameters
  • ds (xr.Dataset) – The dataset the handler will be applied to

  • previous_data (xr.Dataset) – A dataset from the previous processing interval (i.e., file). This is used to check for consistency between files, such as for monotonic or delta checks when we need to check the previous value.

  • quality_manager (QualityManagerDefinition) – The quality_manager definition as specified in the pipeline config file

  • parameters (dict, optional) – A dictionary of handler-specific parameters specified in the pipeline config file. Defaults to {}

abstract run(self, variable_name: str, results_array: numpy.ndarray)

Perform a follow-on action if a quality check fails. This can be used to correct data if needed (such as replacing a bad value with missing value, emailing a contact persion, or raising an exception if the failure constitutes a critical error).

Parameters
  • variable_name (str) – Name of the variable that failed

  • results_array (np.ndarray) – An array of True/False values for each data value of the variable. True means the check failed.

record_correction(self, variable_name: str)

If a correction was made to variable data to fix invalid values as detected by a quality check, this method will record the fix to the appropriate variable attribute. The correction description will come from the handler params which get set in the pipeline config file.

Parameters

variable_name (str) – Name

class tsdat.qc.QCParamKeys

Symbolic constants used for referencing QC-related fields in the pipeline config file

QC_BIT = bit
ASSESSMENT = assessment
TEST_MEANING = meaning
CORRECTION = correction
class tsdat.qc.FailPipeline(ds: xarray.Dataset, previous_data: xarray.Dataset, quality_manager: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)

Bases: QualityHandler

Throw an exception, halting the pipeline & indicating a critical error

run(self, variable_name: str, results_array: numpy.ndarray)

Perform a follow-on action if a quality check fails. This can be used to correct data if needed (such as replacing a bad value with missing value, emailing a contact persion, or raising an exception if the failure constitutes a critical error).

Parameters
  • variable_name (str) – Name of the variable that failed

  • results_array (np.ndarray) – An array of True/False values for each data value of the variable. True means the check failed.

class tsdat.qc.RecordQualityResults(ds: xarray.Dataset, previous_data: xarray.Dataset, quality_manager: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)

Bases: QualityHandler

Record the results of the quality check in an ancillary qc variable.

run(self, variable_name: str, results_array: numpy.ndarray)

Perform a follow-on action if a quality check fails. This can be used to correct data if needed (such as replacing a bad value with missing value, emailing a contact persion, or raising an exception if the failure constitutes a critical error).

Parameters
  • variable_name (str) – Name of the variable that failed

  • results_array (np.ndarray) – An array of True/False values for each data value of the variable. True means the check failed.

class tsdat.qc.RemoveFailedValues(ds: xarray.Dataset, previous_data: xarray.Dataset, quality_manager: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)

Bases: QualityHandler

Replace all the failed values with _FillValue

run(self, variable_name: str, results_array: numpy.ndarray)

Perform a follow-on action if a quality check fails. This can be used to correct data if needed (such as replacing a bad value with missing value, emailing a contact persion, or raising an exception if the failure constitutes a critical error).

Parameters
  • variable_name (str) – Name of the variable that failed

  • results_array (np.ndarray) – An array of True/False values for each data value of the variable. True means the check failed.

class tsdat.qc.SendEmailAWS(ds: xarray.Dataset, previous_data: xarray.Dataset, quality_manager: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)

Bases: QualityHandler

Send an email to the recipients using AWS services.

run(self, variable_name: str, results_array: numpy.ndarray)

Perform a follow-on action if a quality check fails. This can be used to correct data if needed (such as replacing a bad value with missing value, emailing a contact persion, or raising an exception if the failure constitutes a critical error).

Parameters
  • variable_name (str) – Name of the variable that failed

  • results_array (np.ndarray) – An array of True/False values for each data value of the variable. True means the check failed.