tsdat.config
¶
Module that wraps objects defined in pipeline and yaml configuration files.
Submodules¶
Package Contents¶
Classes¶
Wrapper for the pipeline configuration file. |
|
Class that provides a handle for keys in the pipeline config file. |
|
Class to represent dimensions defined in the pipeline config file. |
|
Wrapper for the pipeline portion of the pipeline config file. |
|
Class to encode variable definitions from the config file. Also provides |
|
Wrapper for the dataset_definition portion of the pipeline config |
|
Wrapper for the quality_management portion of the pipeline config |
-
class
tsdat.config.
Config
(dictionary: Dict)¶ Wrapper for the pipeline configuration file.
Note: in most cases,
Config.load(filepath)
should be used to instantiate the Config class.- Parameters
dictionary (Dict) – The pipeline configuration file as a dictionary.
-
_parse_quality_managers
(self, dictionary: Dict) → Dict[str, tsdat.config.quality_manager_definition.QualityManagerDefinition]¶ Extracts QualityManagerDefinitions from the config file.
- Parameters
dictionary (Dict) – The quality_management dictionary.
- Returns
Mapping of quality manager name to QualityManagerDefinition
- Return type
Dict[str, QualityManagerDefinition]
-
classmethod
load
(self, filepaths: List[str])¶ Load one or more yaml pipeline configuration files. Multiple files should only be passed as input if the pipeline configuration file is split across multiple files.
- Parameters
filepaths (List[str]) – The path(s) to yaml configuration files to load.
- Returns
A Config object wrapping the yaml configuration file(s).
- Return type
-
static
lint_yaml
(filename: str)¶ Lints a yaml file and raises an exception if an error is found.
- Parameters
filename (str) – The path to the file to lint.
- Raises
Exception – Raises an exception if an error is found.
-
class
tsdat.config.
Keys
¶ Class that provides a handle for keys in the pipeline config file.
-
PIPELINE
= pipeline¶
-
DATASET_DEFINITION
= dataset_definition¶
-
DEFAULTS
= variable_defaults¶
-
QUALITY_MANAGEMENT
= quality_management¶
-
ATTRIBUTES
= attributes¶
-
DIMENSIONS
= dimensions¶
-
VARIABLES
= variables¶
-
ALL
= ALL¶
-
-
class
tsdat.config.
DimensionDefinition
(name: str, length: Union[str, int])¶ Class to represent dimensions defined in the pipeline config file.
- Parameters
name (str) – The name of the dimension
length (Union[str, int]) – The length of the dimension. This should be one of:
"unlimited"
,"variable"
, or a positive int. The ‘time’ dimension should always have length of"unlimited"
.
-
is_unlimited
(self) → bool¶ Returns
True
is the dimension has unlimited length. Represented by setting the length attribute to"unlimited"
.- Returns
True
if the dimension has unlimited length.- Return type
bool
-
is_variable_length
(self) → bool¶ Returns
True
if the dimension has variable length, meaning that the dimension’s length is set at runtime. Represented by setting the length to"variable"
.- Returns
True
if the dimension has variable length, False otherwise.- Return type
bool
-
class
tsdat.config.
PipelineDefinition
(dictionary: Dict[str, Dict])¶ Wrapper for the pipeline portion of the pipeline config file.
- Parameters
dictionary (Dict[str]) – The pipeline component of the pipeline config file.
- Raises
DefinitionError – Raises DefinitionError if one of the file naming components contains an illegal character.
-
check_file_name_components
(self)¶ Performs sanity checks on the config properties used in naming files output by tsdat pipelines.
- Raises
DefinitionError – Raises DefinitionError if a component has been set improperly.
-
class
tsdat.config.
VariableDefinition
(name: str, dictionary: Dict, available_dimensions: Dict[str, tsdat.config.dimension_definition.DimensionDefinition], defaults: Dict = {})¶ Class to encode variable definitions from the config file. Also provides a few utility methods.
- Parameters
name (str) – The name of the variable in the output file.
dictionary (Dict) – The dictionary entry corresponding with this variable in the config file.
- :param
available_dimensions: A mapping of dimension name to DimensionDefinition objects.
- Parameters
defaults (Dict, optional) – The defaults to use when instantiating this VariableDefinition object, defaults to {}.
-
_parse_input
(self, dictionary: Dict, defaults: Dict = {}) → VarInput¶ Parses the variable’s input property, if it has one, from the variable dictionary.
- Parameters
dictionary (Dict) – The dictionary entry corresponding with this variable in the config file.
defaults (Dict, optional) – The defaults to use when instantiating the VariableDefinition object, defaults to {}.
- Returns
A VarInput object for this VariableDefinition, or None.
- Return type
VarInput
-
_parse_attributes
(self, dictionary: Dict, defaults: Dict = {}) → Dict[str, Any]¶ Parses the variable’s attributes from the variable dictionary.
- Parameters
dictionary (Dict) – The dictionary entry corresponding with this variable in the config file.
defaults (Dict, optional) – The defaults to use when instantiating the VariableDefinition object, defaults to {}.
- Returns
A mapping of attribute name to attribute value.
- Return type
Dict[str, Any]
-
_parse_dimensions
(self, dictionary: Dict, available_dimensions: Dict[str, tsdat.config.dimension_definition.DimensionDefinition], defaults: Dict = {}) → Dict[str, tsdat.config.dimension_definition.DimensionDefinition]¶ Parses the variable’s dimensions from the variable dictionary.
- Parameters
dictionary (Dict) – The dictionary entry corresponding with this variable in the config file.
available_dimensions – A mapping of dimension name to DimensionDefinition.
defaults (Dict, optional) – The defaults to use when instantiating the VariableDefinition object, defaults to {}.
- Returns
A mapping of dimension name to DimensionDefinition objects.
- Return type
Dict[str, DimensionDefinition]
-
_parse_data_type
(self, dictionary: Dict, defaults: Dict = {}) → object¶ Parses the data_type string and returns the appropriate numpy data type (i.e. “float” -> np.float).
- Parameters
dictionary (Dict) – The dictionary entry corresponding with this variable in the config file.
defaults (Dict, optional) – The defaults to use when instantiating the VariableDefinition object, defaults to {}.
- Raises
KeyError – Raises KeyError if the data type in the dictionary does not match a valid data type.
- Returns
The numpy data type corresponding with the type provided in the yaml file, or data_type if the provided data_type is not in the ME Data Standards list of data types.
- Return type
object
-
add_fillvalue_if_none
(self, attributes: Dict[str, Any]) → Dict[str, Any]¶ Adds the _FillValue attribute to the provided attributes dictionary if the _FillValue attribute has not already been defined and returns the modified attributes dictionary.
- Parameters
attributes (Dict[str, Any]) – The dictionary containing user-defined variable attributes.
- Returns
The dictionary containing user-defined variable attributes. Is guaranteed to have a _FillValue attribute.
- Return type
Dict[str, Any]
-
is_constant
(self) → bool¶ Returns True if the variable is a constant. A variable is constant if it does not have any dimensions.
- Returns
True if the variable is constant, False otherwise.
- Return type
bool
-
is_predefined
(self) → bool¶ Returns True if the variable’s data was predefined in the config yaml file.
- Returns
True if the variable is predefined, False otherwise.
- Return type
bool
-
is_coordinate
(self) → bool¶ Returns True if the variable is a coordinate variable. A variable is defined as a coordinate variable if it is dimensioned by itself.
- Returns
True if the variable is a coordinate variable, False otherwise.
- Return type
bool
-
is_derived
(self) → bool¶ Return True if the variable is derived. A variable is derived if it does not have an input and it is not predefined.
- Returns
True if the Variable is derived, False otherwise.
- Return type
bool
-
has_converter
(self) → bool¶ Returns True if the variable has an input converter defined, False otherwise.
- Returns
True if the Variable has a converter defined, False otherwise.
- Return type
bool
-
is_required
(self) → bool¶ Returns True if the variable has the ‘required’ property defined and the ‘required’ property evaluates to True. A required variable is a variable which much be retrieved in the input dataset. If a required variable is not in the input dataset, the process should crash.
- Returns
True if the variable is required, False otherwise.
- Return type
bool
-
has_input
(self) → bool¶ Return True if the variable is copied from an input dataset, regardless of whether or not unit and/or naming conversions should be applied.
- Returns
True if the Variable has an input defined, False otherwise.
- Return type
bool
-
get_input_name
(self) → str¶ Returns the name of the variable in the input if defined, otherwise returns None.
- Returns
The name of the variable in the input, or None.
- Return type
str
-
get_input_units
(self) → str¶ If the variable has input, returns the units of the input variable or the output units if no input units are defined.
- Returns
The units of the input variable data.
- Return type
str
-
get_output_units
(self) → str¶ Returns the units of the output data or None if no units attribute has been defined.
- Returns
The units of the output variable data.
- Return type
str
-
get_coordinate_names
(self) → List[str]¶ Returns the names of the coordinate VariableDefinition(s) that this VariableDefinition is dimensioned by.
- Returns
A list of dimension/coordinate variable names.
- Return type
List[str]
-
get_shape
(self) → Tuple[int]¶ Returns the shape of the data attribute on the VariableDefinition.
- Raises
KeyError – Raises a KeyError if the data attribute has not been set yet.
- Returns
The shape of the VariableDefinition’s data, or None.
- Return type
Tuple[int]
-
get_data_type
(self) → numpy.dtype¶ Retrieves the variable’s data type.
- Returns
Returns the data type of the variable’s data as a numpy dtype.
- Return type
np.dtype
-
get_FillValue
(self) → int¶ Retrieves the variable’s _FillValue attribute, using -9999 as a default if it has not been defined.
- Returns
Returns the variable’s _FillValue.
- Return type
int
-
run_converter
(self, data: numpy.ndarray) → numpy.ndarray¶ If the variable has an input converter, runs the input converter for the input/output units on the provided data.
- Parameters
data (np.ndarray) – The data to be converted.
- Returns
Returns the data after it has been run through the variable’s converter.
- Return type
np.ndarray
-
to_dict
(self) → Dict¶ Returns the Variable as a dictionary to be used to intialize an empty xarray Dataset or DataArray.
Returns a dictionary like (Example is for temperature):
{ "dims": ["time"], "data": [], "attrs": {"units": "degC"} }
- Returns
A dictionary representation of the variable.
- Return type
Dict
-
class
tsdat.config.
DatasetDefinition
(dictionary: Dict, datastream_name: str)¶ Wrapper for the dataset_definition portion of the pipeline config file.
- Parameters
dictionary (Dict) – The portion of the config file corresponding with the dataset definition.
datastream_name (str) – The name of the datastream that the config file is for.
-
_parse_dimensions
(self, dictionary: Dict) → Dict[str, tsdat.config.dimension_definition.DimensionDefinition]¶ Extracts the dimensions from the dataset_definition portion of the config file.
- Parameters
dictionary (Dict) – The dataset_definition dictionary from the config file.
- Returns
Returns a mapping of output dimension names to DimensionDefinition objects.
- Return type
Dict[str, DimensionDefinition]
-
_parse_variables
(self, dictionary: Dict, available_dimensions: Dict[str, tsdat.config.dimension_definition.DimensionDefinition]) → Dict[str, tsdat.config.variable_definition.VariableDefinition]¶ Extracts the variables from the dataset_definition portion of the config file.
- Parameters
dictionary (Dict) – The dataset_definition dictionary from the config file.
available_dimensions (Dict[str, DimensionDefinition]) – The DimensionDefinition objects that have already been parsed.
- Returns
Returns a mapping of output variable names to VariableDefinition objects.
- Return type
Dict[str, VariableDefinition]
-
_parse_coordinates
(self, vars: Dict[str, tsdat.config.variable_definition.VariableDefinition]) → Tuple[Dict[str, tsdat.config.variable_definition.VariableDefinition], Dict[str, tsdat.config.variable_definition.VariableDefinition]]¶ Separates coordinate variables and data variables.
Determines which variables are coordinate variables and moves those variables from
self.vars
toself.coords
. Coordinate variables are defined as variables that are dimensioned by themselves, i.e.,var.name == var.dim.name
is a true statement for coordinate variables, but false for data variables.- Parameters
vars (Dict[str, VariableDefinition]) – The dictionary of VariableDefinition objects to check.
- Returns
The dictionary of dimensions in the dataset.
- Return type
Tuple[Dict[str, VariableDefinition], Dict[str, VariableDefinition]]
-
_validate_dataset_definition
(self)¶ Performs sanity checks on the DatasetDefinition object.
- Raises
DefinitionError – If any sanity checks fail.
-
get_attr
(self, attribute_name) → Any¶ Retrieves the value of the attribute requested, or None if it does not exist.
- Parameters
attribute_name (str) – The name of the attribute to retrieve.
- Returns
The value of the attribute, or None.
- Return type
Any
-
get_variable_names
(self) → List[str]¶ Retrieves the list of variable names. Note that this excludes coordinate variables.
- Returns
The list of variable names.
- Return type
List[str]
-
get_variable
(self, variable_name: str) → tsdat.config.variable_definition.VariableDefinition¶ Attemps to retrieve the requested variable. First searches the data variables, then searches the coordinate variables. Returns
None
if no data or coordinate variables have been defined with the requested variable name.- Parameters
variable_name (str) – The name of the variable to retrieve.
- Returns
Returns the VariableDefinition for the variable, or
None
if the variable could not be found.- Return type
-
get_coordinates
(self, variable: tsdat.config.variable_definition.VariableDefinition) → List[tsdat.config.variable_definition.VariableDefinition]¶ Returns the coordinate VariableDefinition object(s) that dimension the requested VariableDefinition.
- Parameters
variable (VariableDefinition) – The VariableDefinition whose coordinate variables should be retrieved.
- Returns
A list of VariableDefinition coordinate variables that dimension the provided VariableDefinition.
- Return type
List[VariableDefinition]
-
get_static_variables
(self) → List[tsdat.config.variable_definition.VariableDefinition]¶ Retrieves a list of static VariableDefinition objects. A variable is defined as static if it has a “data” section in the config file, which would mean that the variable’s data is defined statically. For example, in the config file snippet below, “depth” is a static variable:
depth: data: [4, 8, 12] dims: [depth] type: int attrs: long_name: Depth units: m
- Returns
The list of static VariableDefinition objects.
- Return type
List[VariableDefinition]
-
class
tsdat.config.
QualityManagerDefinition
(name: str, dictionary: Dict)¶ Wrapper for the quality_management portion of the pipeline config file.
- Parameters
name (str) – The name of the quality manager in the config file.
dictionary (Dict) – The dictionary contents of the quality manager from the config file.