Notes on Tsdat’s Quality Control Methods¶
In Tsdat, all variables are given a corollary quality control (QC) variable if QC tests for coordinates or data variables are run in a given pipeline. What this means that is data variables will not get a corollary QC variable if the respective QC blocks for “DATA_VARS” are commented out. The basic quality.yaml file that is given in the template is shown below:
managers:
#---------------------------------------------------------------
- name: Fail if missing coordinates
checker:
classname: tsdat.qc.checkers.CheckMissing
handlers:
- classname: tsdat.qc.handlers.FailPipeline
parameters:
context: Coordinate variables cannot be missing.
apply_to:
- COORDS
- name: Fail if monotonic coordinates
checker:
classname: tsdat.qc.checkers.CheckMonotonic
parameters:
require_increasing: true
handlers:
- classname: tsdat.qc.handlers.FailPipeline
parameters:
context: Coordinate variables must be strictly increasing.
apply_to:
- COORDS
#---------------------------------------------------------------
- name: Remove missing data
checker:
classname: tsdat.qc.checkers.CheckMissing
handlers:
- classname: tsdat.qc.handlers.RemoveFailedValues
- classname: tsdat.qc.handlers.RecordQualityResults
parameters:
bit: 1
assessment: bad
meaning: "Value is equal to _FillValue or NaN"
apply_to:
- DATA_VARS
- name: Flag data below minimum valid threshold
checker:
classname: tsdat.qc.checkers.CheckValidMin
handlers:
- classname: tsdat.qc.handlers.RecordQualityResults
parameters:
bit: 2
assessment: bad
meaning: "Value is less than the valid_min."
apply_to:
- DATA_VARS
- name: Flag data above maximum valid threshold
checker:
classname: tsdat.qc.checkers.CheckValidMax
handlers:
- classname: tsdat.qc.handlers.RecordQualityResults
parameters:
bit: 3
assessment: bad
meaning: "Value is greater than the valid_max."
apply_to:
- DATA_VARS
- A QC block consists of
the keyword “name”, simply the block’s description
the keyword “checker”, and an associated “classname” one line below it: the QC test to use.
the keyword “handler”, and an associated list of “classname”(s) (hence the extra hyphen in front of “classname”)
the keyword “apply_to”: this can be “COORDS”, “DATA_VARS”, or a list of variable names
These QC blocks can take additional parameters for more sophisticated QC algorithms. Customized “qc.py” files that require editable parameters can be set using the “parameters” keyword. In the QC block below, the QC class “CheckCorrelation” exists in the “<pipeline_name>/shared/qc.py” file. The classname is therefore set as “shared.qc.CheckCorrelation”.
The “exclude” keyword can be used to exclude certain variables from a QC test, and is typically needed for variables that are not numeric, i.e. chars and strings.
Also know that whitespace is not critical for yaml files, and it is good to be consistent with however yours is set.
- name: Flag data below correlation threshold
checker:
classname: shared.qc.CheckCorrelation
parameters:
correlation_threshold: 30
handlers:
- classname: tsdat.qc.handlers.RemoveFailedValues
- classname: tsdat.qc.handlers.RecordQualityResults
parameters:
bit: 4
assessment: bad
meaning: "Value is less than correlation threshold"
apply_to: [vel, corr, amp]
exclude: [vel_bt]
Finally, it’s important to go over the parameters required for RecordQualityResults
,
which is the built-in function that all QC blocks should use to record the QC test results.
It takes 4 parameters: “bit”, “assessment”, and “meaning”. These parameters are turned into variable attributes in the pipeline output dataset: “flag_mask”, “flag_assessment”, and “flag_meaning”, respectively.
“Bit” is shorthand for the QC bit, which is defined sequentially starting from 1 to “n”, depending on how many tests a pipeline has. The “flag_mask” is calculated as 2^{bit-1}. So for the bits 1, 2, 3, and 4, the associated flag masks will be 1, 2, 4, and 8. If a flag has the value 13, then that means it failed the tests associated with flag masks 1, 4, and 8 (1 + 4 + 8 = 13), which are QC bits 1, 3, and 4. This scheme works because any addition of the flag masks can only come from a unique set of QC bits.
“Assessment” is one of two terms: “bad” or “indeterminate”. This simply flags if the test that failed did so because the datapoint is of bad quality or if it may be cause for concern.
“Meaning” is the description of the failure. This is a short statement of which test failed, and “flag_meaning” is listed in the same order as “flag_masks”.