tsdat.qc
¶
The tsdat.qc package provides the classes that the data pipeline uses to manage quality control/quality assurance for the dataset. This includes the infrastructure to run quality tests and handle failures, as well specific checkers and handlers that can be specified in the pipeline config file.
We warmly welcome community contribututions to increase this default list.
Submodules¶
Classes¶
Check that no values for the specified variable greater less than |
|
Check that no values for the specified variable are less than |
|
Check that no values for the specified variable are greater than |
|
Check that no values for the specified variable are less than |
|
Checks if any values are assigned to _FillValue or ‘NaN’ (for non-time |
|
Checks that all values for the specified variable are either |
|
Check that the difference between any two consecutive |
|
Check that no values for the specified variable are greater than |
|
Check that no values for the specified variable are less than |
|
Check that no values for the specified variable are greater than |
|
Check that no values for the specified variable are less than |
|
Throw an exception, halting the pipeline & indicating a critical error |
|
Symbolic constants used for referencing QC-related |
|
Class containing the code to perform a single Quality Check on a |
|
Class containing code to be executed if a particular quality check fails. |
|
Class that provides static helper functions for providing quality |
|
Applies a single Quality Manager to the given Dataset, as defined by |
|
Record the results of the quality check in an ancillary qc variable. |
|
Replace all the failed values with _FillValue |
|
Send an email to the recipients using AWS services. |
-
class
tsdat.qc.
CheckFailMax
(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)[source]¶ Bases:
CheckMax
Check that no values for the specified variable greater less than the maximum vaue set by the ‘fail_range’ attribute. If the variable in question does not posess the ‘fail_range’ attribute, this check will be skipped.
-
class
tsdat.qc.
CheckFailMin
(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)[source]¶ Bases:
CheckMin
Check that no values for the specified variable are less than the minimum vaue set by the ‘fail_range’ attribute. If the variable in question does not posess the ‘fail_range’ attribute, this check will be skipped.
-
class
tsdat.qc.
CheckMax
(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]¶ Bases:
QualityChecker
Check that no values for the specified variable are greater than a specified maximum threshold. The threshold value is an attribute set on the variable in question. The attribute name is specified in the quality checker definition in the pipeline config file by setting a param called ‘key: ATTRIBUTE_NAME’.
If the key parameter is not set or the variable does not possess the specified attribute, this check will be skipped.
Class Methods
Check a dataset’s variable to see if it passes a quality check.
Method Descriptions
-
run
(self, variable_name: str) → Optional[numpy.ndarray]¶ Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.
- Parameters
variable_name (str) – The name of the variable to check
- Returns
If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.
Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.
If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.
- Return type
Optional[np.ndarray]
-
-
class
tsdat.qc.
CheckMin
(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]¶ Bases:
QualityChecker
Check that no values for the specified variable are less than a specified minimum threshold. The threshold value is an attribute set on the variable in question. The attribute name is specified in the quality checker definition in the pipeline config file by setting a param called ‘key: ATTRIBUTE_NAME’.
If the key parameter is not set or the variable does not possess the specified attribute, this check will be skipped.
Class Methods
Check a dataset’s variable to see if it passes a quality check.
Method Descriptions
-
run
(self, variable_name: str) → Optional[numpy.ndarray]¶ Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.
- Parameters
variable_name (str) – The name of the variable to check
- Returns
If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.
Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.
If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.
- Return type
Optional[np.ndarray]
-
-
class
tsdat.qc.
CheckMissing
(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]¶ Bases:
QualityChecker
Checks if any values are assigned to _FillValue or ‘NaN’ (for non-time variables) or checks if values are assigned to ‘NaT’ (for time variables). Also, for non-time variables, checks if values are above or below valid_range, as this is considered missing as well.
Class Methods
Check a dataset’s variable to see if it passes a quality check.
Method Descriptions
-
run
(self, variable_name: str) → Optional[numpy.ndarray]¶ Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.
- Parameters
variable_name (str) – The name of the variable to check
- Returns
If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.
Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.
If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.
- Return type
Optional[np.ndarray]
-
-
class
tsdat.qc.
CheckMonotonic
(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]¶ Bases:
QualityChecker
Checks that all values for the specified variable are either strictly increasing or strictly decreasing.
Class Methods
Check a dataset’s variable to see if it passes a quality check.
Method Descriptions
-
run
(self, variable_name: str) → Optional[numpy.ndarray]¶ Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.
- Parameters
variable_name (str) – The name of the variable to check
- Returns
If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.
Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.
If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.
- Return type
Optional[np.ndarray]
-
-
class
tsdat.qc.
CheckValidDelta
(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]¶ Bases:
QualityChecker
Check that the difference between any two consecutive values is not greater than the threshold set by the ‘valid_delta’ attribute. If the variable in question does not posess the ‘valid_delta’ attribute, this check will be skipped.
Class Methods
Check a dataset’s variable to see if it passes a quality check.
Method Descriptions
-
run
(self, variable_name: str) → Optional[numpy.ndarray]¶ Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.
- Parameters
variable_name (str) – The name of the variable to check
- Returns
If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.
Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.
If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.
- Return type
Optional[np.ndarray]
-
-
class
tsdat.qc.
CheckValidMax
(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)[source]¶ Bases:
CheckMax
Check that no values for the specified variable are greater than the maximum vaue set by the ‘valid_range’ attribute. If the variable in question does not posess the ‘valid_range’ attribute, this check will be skipped.
-
class
tsdat.qc.
CheckValidMin
(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)[source]¶ Bases:
CheckMin
Check that no values for the specified variable are less than the minimum vaue set by the ‘valid_range’ attribute. If the variable in question does not posess the ‘valid_range’ attribute, this check will be skipped.
-
class
tsdat.qc.
CheckWarnMax
(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)[source]¶ Bases:
CheckMax
Check that no values for the specified variable are greater than the maximum vaue set by the ‘warn_range’ attribute. If the variable in question does not posess the ‘warn_range’ attribute, this check will be skipped.
-
class
tsdat.qc.
CheckWarnMin
(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters)[source]¶ Bases:
CheckMin
Check that no values for the specified variable are less than the minimum vaue set by the ‘warn_range’ attribute. If the variable in question does not posess the ‘warn_range’ attribute, this check will be skipped.
-
class
tsdat.qc.
FailPipeline
(ds: xarray.Dataset, previous_data: xarray.Dataset, quality_manager: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]¶ Bases:
QualityHandler
Throw an exception, halting the pipeline & indicating a critical error
Class Methods
Perform a follow-on action if a quality check fails. This can be used
Method Descriptions
-
run
(self, variable_name: str, results_array: numpy.ndarray)¶ Perform a follow-on action if a quality check fails. This can be used to correct data if needed (such as replacing a bad value with missing value, emailing a contact persion, or raising an exception if the failure constitutes a critical error).
- Parameters
variable_name (str) – Name of the variable that failed
results_array (np.ndarray) – An array of True/False values for each data value of the variable. True means the check failed.
-
-
class
tsdat.qc.
QCParamKeys
[source]¶ Symbolic constants used for referencing QC-related fields in the pipeline config file
-
ASSESSMENT
= assessment¶
-
CORRECTION
= correction¶
-
QC_BIT
= bit¶
-
TEST_MEANING
= meaning¶
-
-
class
tsdat.qc.
QualityChecker
(ds: xarray.Dataset, previous_data: xarray.Dataset, definition: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]¶ Bases:
abc.ABC
Class containing the code to perform a single Quality Check on a Dataset variable.
- Parameters
ds (xr.Dataset) – The dataset the checker will be applied to
previous_data (xr.Dataset) – A dataset from the previous processing interval (i.e., file). This is used to check for consistency between files, such as for monitonic or delta checks when we need to check the previous value.
definition (QualityManagerDefinition) – The quality manager definition as specified in the pipeline config file
parameters (dict, optional) – A dictionary of checker-specific parameters specified in the pipeline config file. Defaults to {}
Class Methods
Check a dataset’s variable to see if it passes a quality check.
Method Descriptions
-
abstract
run
(self, variable_name: str) → Optional[numpy.ndarray]¶ Check a dataset’s variable to see if it passes a quality check. These checks can be performed on the entire variable at one time by using xarray vectorized numerical operators.
- Parameters
variable_name (str) – The name of the variable to check
- Returns
If the check was performed, return a ndarray of the same shape as the variable. Each value in the data array will be either True or False, depending upon the results of the check. True means the check failed. False means it succeeded.
Note that we are using an np.ndarray instead of an xr.DataArray because the DataArray contains coordinate indexes which can sometimes get out of sync when performing np arithmectic vector operations. So it’s easier to just use numpy arrays.
If the check was skipped for some reason (i.e., it was not relevant given the current attributes defined for this dataset), then the run method should return None.
- Return type
Optional[np.ndarray]
-
class
tsdat.qc.
QualityHandler
(ds: xarray.Dataset, previous_data: xarray.Dataset, quality_manager: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]¶ Bases:
abc.ABC
Class containing code to be executed if a particular quality check fails.
- Parameters
ds (xr.Dataset) – The dataset the handler will be applied to
previous_data (xr.Dataset) – A dataset from the previous processing interval (i.e., file). This is used to check for consistency between files, such as for monotonic or delta checks when we need to check the previous value.
quality_manager (QualityManagerDefinition) – The quality_manager definition as specified in the pipeline config file
parameters (dict, optional) – A dictionary of handler-specific parameters specified in the pipeline config file. Defaults to {}
Class Methods
If a correction was made to variable data to fix invalid values
Perform a follow-on action if a quality check fails. This can be used
Method Descriptions
-
record_correction
(self, variable_name: str)¶ If a correction was made to variable data to fix invalid values as detected by a quality check, this method will record the fix to the appropriate variable attribute. The correction description will come from the handler params which get set in the pipeline config file.
- Parameters
variable_name (str) – Name
-
abstract
run
(self, variable_name: str, results_array: numpy.ndarray)¶ Perform a follow-on action if a quality check fails. This can be used to correct data if needed (such as replacing a bad value with missing value, emailing a contact persion, or raising an exception if the failure constitutes a critical error).
- Parameters
variable_name (str) – Name of the variable that failed
results_array (np.ndarray) – An array of True/False values for each data value of the variable. True means the check failed.
-
class
tsdat.qc.
QualityManagement
[source]¶ Class that provides static helper functions for providing quality control checks on a tsdat-standardized xarray dataset.
Class Methods
Applies the Quality Managers defined in the given Config to this dataset.
Method Descriptions
-
static
run
(ds: xarray.Dataset, config: tsdat.config.Config, previous_data: xarray.Dataset) → xarray.Dataset¶ Applies the Quality Managers defined in the given Config to this dataset. QC results will be embedded in the dataset. QC metadata will be stored as attributes, and QC flags will be stored as a bitwise integer in new companion qc_ variables that are added to the dataset. This method will create QC companion variables if they don’t exist.
- Parameters
ds (xr.Dataset) – The dataset to apply quality managers to
config (Config) – A configuration definition (loaded from yaml)
previous_data (xr.Dataset) – A dataset from the previous processing interval (i.e., file). This is used to check for consistency between files, such as for monitonic or delta checks when we need to check the previous value.
- Returns
The dataset after quality checkers and handlers have been applied.
- Return type
xr.Dataset
-
static
-
class
tsdat.qc.
QualityManager
(ds: xarray.Dataset, config: tsdat.config.Config, definition: tsdat.config.QualityManagerDefinition, previous_data: xarray.Dataset)[source]¶ Applies a single Quality Manager to the given Dataset, as defined by the Config
- Parameters
ds (xr.Dataset) – The dataset for which we will perform quality management.
config (Config) – The Config from the pipeline definition file.
definition (QualityManagerDefinition) – Definition of the quality test this class manages.
previous_data (xr.Dataset) – A dataset from the previous processing interval (i.e., file). This is used to check for consistency between files, such as for monitonic or delta checks when we need to check the previous value.
Class Methods
Runs the QualityChecker and QualityHandler(s) for each specified
Method Descriptions
-
run
(self) → xarray.Dataset¶ Runs the QualityChecker and QualityHandler(s) for each specified variable as defined in the config file.
- Returns
The dataset after the quality checker and the quality handlers have been run.
- Raises
QCError – A QCError indicates that a fatal error has occurred.
- Return type
xr.Dataset
-
class
tsdat.qc.
RecordQualityResults
(ds: xarray.Dataset, previous_data: xarray.Dataset, quality_manager: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]¶ Bases:
QualityHandler
Record the results of the quality check in an ancillary qc variable.
Class Methods
Perform a follow-on action if a quality check fails. This can be used
Method Descriptions
-
run
(self, variable_name: str, results_array: numpy.ndarray)¶ Perform a follow-on action if a quality check fails. This can be used to correct data if needed (such as replacing a bad value with missing value, emailing a contact persion, or raising an exception if the failure constitutes a critical error).
- Parameters
variable_name (str) – Name of the variable that failed
results_array (np.ndarray) – An array of True/False values for each data value of the variable. True means the check failed.
-
-
class
tsdat.qc.
RemoveFailedValues
(ds: xarray.Dataset, previous_data: xarray.Dataset, quality_manager: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]¶ Bases:
QualityHandler
Replace all the failed values with _FillValue
Class Methods
Perform a follow-on action if a quality check fails. This can be used
Method Descriptions
-
run
(self, variable_name: str, results_array: numpy.ndarray)¶ Perform a follow-on action if a quality check fails. This can be used to correct data if needed (such as replacing a bad value with missing value, emailing a contact persion, or raising an exception if the failure constitutes a critical error).
- Parameters
variable_name (str) – Name of the variable that failed
results_array (np.ndarray) – An array of True/False values for each data value of the variable. True means the check failed.
-
-
class
tsdat.qc.
SendEmailAWS
(ds: xarray.Dataset, previous_data: xarray.Dataset, quality_manager: tsdat.config.QualityManagerDefinition, parameters: Union[Dict, None] = None)[source]¶ Bases:
QualityHandler
Send an email to the recipients using AWS services.
Class Methods
Perform a follow-on action if a quality check fails. This can be used
Method Descriptions
-
run
(self, variable_name: str, results_array: numpy.ndarray)¶ Perform a follow-on action if a quality check fails. This can be used to correct data if needed (such as replacing a bad value with missing value, emailing a contact persion, or raising an exception if the failure constitutes a critical error).
- Parameters
variable_name (str) – Name of the variable that failed
results_array (np.ndarray) – An array of True/False values for each data value of the variable. True means the check failed.
-