tsdat.config

Module that wraps objects defined in pipeline and yaml configuration files.

Package Contents

Classes

Config

Wrapper for the pipeline configuration file.

Keys

Class that provides a handle for keys in the pipeline config file.

DimensionDefinition

Class to represent dimensions defined in the pipeline config file.

PipelineDefinition

Wrapper for the pipeline portion of the pipeline config file.

VariableDefinition

Class to encode variable definitions from the config file. Also provides

DatasetDefinition

Wrapper for the dataset_definition portion of the pipeline config

QualityManagerDefinition

Wrapper for the quality_management portion of the pipeline config

class tsdat.config.Config(dictionary: Dict)

Wrapper for the pipeline configuration file.

Note: in most cases, Config.load(filepath) should be used to instantiate the Config class.

Parameters

dictionary (Dict) – The pipeline configuration file as a dictionary.

_parse_quality_managers(self, dictionary: Dict)Dict[str, tsdat.config.quality_manager_definition.QualityManagerDefinition]

Extracts QualityManagerDefinitions from the config file.

Parameters

dictionary (Dict) – The quality_management dictionary.

Returns

Mapping of quality manager name to QualityManagerDefinition

Return type

Dict[str, QualityManagerDefinition]

classmethod load(self, filepaths: List[str])

Load one or more yaml pipeline configuration files. Multiple files should only be passed as input if the pipeline configuration file is split across multiple files.

Parameters

filepaths (List[str]) – The path(s) to yaml configuration files to load.

Returns

A Config object wrapping the yaml configuration file(s).

Return type

Config

static lint_yaml(filename: str)

Lints a yaml file and raises an exception if an error is found.

Parameters

filename (str) – The path to the file to lint.

Raises

Exception – Raises an exception if an error is found.

class tsdat.config.Keys

Class that provides a handle for keys in the pipeline config file.

PIPELINE = pipeline
DATASET_DEFINITION = dataset_definition
DEFAULTS = variable_defaults
QUALITY_MANAGEMENT = quality_management
ATTRIBUTES = attributes
DIMENSIONS = dimensions
VARIABLES = variables
ALL = ALL
class tsdat.config.DimensionDefinition(name: str, length: Union[str, int])

Class to represent dimensions defined in the pipeline config file.

Parameters
  • name (str) – The name of the dimension

  • length (Union[str, int]) – The length of the dimension. This should be one of: "unlimited", "variable", or a positive int. The ‘time’ dimension should always have length of "unlimited".

is_unlimited(self)bool

Returns True is the dimension has unlimited length. Represented by setting the length attribute to "unlimited".

Returns

True if the dimension has unlimited length.

Return type

bool

is_variable_length(self)bool

Returns True if the dimension has variable length, meaning that the dimension’s length is set at runtime. Represented by setting the length to "variable".

Returns

True if the dimension has variable length, False otherwise.

Return type

bool

class tsdat.config.PipelineDefinition(dictionary: Dict[str, Dict])

Wrapper for the pipeline portion of the pipeline config file.

Parameters

dictionary (Dict[str]) – The pipeline component of the pipeline config file.

Raises

DefinitionError – Raises DefinitionError if one of the file naming components contains an illegal character.

check_file_name_components(self)

Performs sanity checks on the config properties used in naming files output by tsdat pipelines.

Raises

DefinitionError – Raises DefinitionError if a component has been set improperly.

class tsdat.config.VariableDefinition(name: str, dictionary: Dict, available_dimensions: Dict[str, tsdat.config.dimension_definition.DimensionDefinition], defaults: Union[Dict, None] = None)

Class to encode variable definitions from the config file. Also provides a few utility methods.

Parameters
  • name (str) – The name of the variable in the output file.

  • dictionary (Dict) – The dictionary entry corresponding with this variable in the config file.

:param

available_dimensions: A mapping of dimension name to DimensionDefinition objects.

Parameters

defaults (Dict, optional) – The defaults to use when instantiating this VariableDefinition object, defaults to {}.

_parse_input(self, dictionary: Dict, defaults: Union[Dict, None] = None)VarInput

Parses the variable’s input property, if it has one, from the variable dictionary.

Parameters
  • dictionary (Dict) – The dictionary entry corresponding with this variable in the config file.

  • defaults (Dict, optional) – The defaults to use when instantiating the VariableDefinition object, defaults to {}.

Returns

A VarInput object for this VariableDefinition, or None.

Return type

VarInput

_parse_attributes(self, dictionary: Dict, defaults: Union[Dict, None] = None)Dict[str, Any]

Parses the variable’s attributes from the variable dictionary.

Parameters
  • dictionary (Dict) – The dictionary entry corresponding with this variable in the config file.

  • defaults (Dict, optional) – The defaults to use when instantiating the VariableDefinition object, defaults to {}.

Returns

A mapping of attribute name to attribute value.

Return type

Dict[str, Any]

_parse_dimensions(self, dictionary: Dict, available_dimensions: Dict[str, tsdat.config.dimension_definition.DimensionDefinition], defaults: Union[Dict, None] = None)Dict[str, tsdat.config.dimension_definition.DimensionDefinition]

Parses the variable’s dimensions from the variable dictionary.

Parameters
  • dictionary (Dict) – The dictionary entry corresponding with this variable in the config file.

  • available_dimensions – A mapping of dimension name to DimensionDefinition.

  • defaults (Dict, optional) – The defaults to use when instantiating the VariableDefinition object, defaults to {}.

Returns

A mapping of dimension name to DimensionDefinition objects.

Return type

Dict[str, DimensionDefinition]

_parse_data_type(self, dictionary: Dict, defaults: Union[Dict, None] = None)object

Parses the data_type string and returns the appropriate numpy data type (i.e. “float” -> np.float).

Parameters
  • dictionary (Dict) – The dictionary entry corresponding with this variable in the config file.

  • defaults (Dict, optional) – The defaults to use when instantiating the VariableDefinition object, defaults to {}.

Raises

KeyError – Raises KeyError if the data type in the dictionary does not match a valid data type.

Returns

The numpy data type corresponding with the type provided in the yaml file, or data_type if the provided data_type is not in the ME Data Standards list of data types.

Return type

object

add_fillvalue_if_none(self, attributes: Dict[str, Any])Dict[str, Any]

Adds the _FillValue attribute to the provided attributes dictionary if the _FillValue attribute has not already been defined and returns the modified attributes dictionary.

Parameters

attributes (Dict[str, Any]) – The dictionary containing user-defined variable attributes.

Returns

The dictionary containing user-defined variable attributes. Is guaranteed to have a _FillValue attribute.

Return type

Dict[str, Any]

is_constant(self)bool

Returns True if the variable is a constant. A variable is constant if it does not have any dimensions.

Returns

True if the variable is constant, False otherwise.

Return type

bool

is_predefined(self)bool

Returns True if the variable’s data was predefined in the config yaml file.

Returns

True if the variable is predefined, False otherwise.

Return type

bool

is_coordinate(self)bool

Returns True if the variable is a coordinate variable. A variable is defined as a coordinate variable if it is dimensioned by itself.

Returns

True if the variable is a coordinate variable, False otherwise.

Return type

bool

is_derived(self)bool

Return True if the variable is derived. A variable is derived if it does not have an input and it is not predefined.

Returns

True if the Variable is derived, False otherwise.

Return type

bool

has_converter(self)bool

Returns True if the variable has an input converter defined, False otherwise.

Returns

True if the Variable has a converter defined, False otherwise.

Return type

bool

is_required(self)bool

Returns True if the variable has the ‘required’ property defined and the ‘required’ property evaluates to True. A required variable is a variable which much be retrieved in the input dataset. If a required variable is not in the input dataset, the process should crash.

Returns

True if the variable is required, False otherwise.

Return type

bool

has_input(self)bool

Return True if the variable is copied from an input dataset, regardless of whether or not unit and/or naming conversions should be applied.

Returns

True if the Variable has an input defined, False otherwise.

Return type

bool

get_input_name(self)str

Returns the name of the variable in the input if defined, otherwise returns None.

Returns

The name of the variable in the input, or None.

Return type

str

get_input_units(self)str

If the variable has input, returns the units of the input variable or the output units if no input units are defined.

Returns

The units of the input variable data.

Return type

str

get_output_units(self)str

Returns the units of the output data or None if no units attribute has been defined.

Returns

The units of the output variable data.

Return type

str

get_coordinate_names(self)List[str]

Returns the names of the coordinate VariableDefinition(s) that this VariableDefinition is dimensioned by.

Returns

A list of dimension/coordinate variable names.

Return type

List[str]

get_shape(self)Tuple[int]

Returns the shape of the data attribute on the VariableDefinition.

Raises

KeyError – Raises a KeyError if the data attribute has not been set yet.

Returns

The shape of the VariableDefinition’s data, or None.

Return type

Tuple[int]

get_data_type(self)numpy.dtype

Retrieves the variable’s data type.

Returns

Returns the data type of the variable’s data as a numpy dtype.

Return type

np.dtype

get_FillValue(self)int

Retrieves the variable’s _FillValue attribute, using -9999 as a default if it has not been defined.

Returns

Returns the variable’s _FillValue.

Return type

int

run_converter(self, data: numpy.ndarray)numpy.ndarray

If the variable has an input converter, runs the input converter for the input/output units on the provided data.

Parameters

data (np.ndarray) – The data to be converted.

Returns

Returns the data after it has been run through the variable’s converter.

Return type

np.ndarray

to_dict(self)Dict

Returns the Variable as a dictionary to be used to intialize an empty xarray Dataset or DataArray.

Returns a dictionary like (Example is for temperature):

{
    "dims": ["time"],
    "data": [],
    "attrs": {"units": "degC"}
}
Returns

A dictionary representation of the variable.

Return type

Dict

class tsdat.config.DatasetDefinition(dictionary: Dict, datastream_name: str)

Wrapper for the dataset_definition portion of the pipeline config file.

Parameters
  • dictionary (Dict) – The portion of the config file corresponding with the dataset definition.

  • datastream_name (str) – The name of the datastream that the config file is for.

_parse_dimensions(self, dictionary: Dict)Dict[str, tsdat.config.dimension_definition.DimensionDefinition]

Extracts the dimensions from the dataset_definition portion of the config file.

Parameters

dictionary (Dict) – The dataset_definition dictionary from the config file.

Returns

Returns a mapping of output dimension names to DimensionDefinition objects.

Return type

Dict[str, DimensionDefinition]

_parse_variables(self, dictionary: Dict, available_dimensions: Dict[str, tsdat.config.dimension_definition.DimensionDefinition])Dict[str, tsdat.config.variable_definition.VariableDefinition]

Extracts the variables from the dataset_definition portion of the config file.

Parameters
  • dictionary (Dict) – The dataset_definition dictionary from the config file.

  • available_dimensions (Dict[str, DimensionDefinition]) – The DimensionDefinition objects that have already been parsed.

Returns

Returns a mapping of output variable names to VariableDefinition objects.

Return type

Dict[str, VariableDefinition]

_parse_coordinates(self, vars: Dict[str, tsdat.config.variable_definition.VariableDefinition])Tuple[Dict[str, tsdat.config.variable_definition.VariableDefinition], Dict[str, tsdat.config.variable_definition.VariableDefinition]]

Separates coordinate variables and data variables.

Determines which variables are coordinate variables and moves those variables from self.vars to self.coords. Coordinate variables are defined as variables that are dimensioned by themselves, i.e., var.name == var.dim.name is a true statement for coordinate variables, but false for data variables.

Parameters

vars (Dict[str, VariableDefinition]) – The dictionary of VariableDefinition objects to check.

Returns

The dictionary of dimensions in the dataset.

Return type

Tuple[Dict[str, VariableDefinition], Dict[str, VariableDefinition]]

_validate_dataset_definition(self)

Performs sanity checks on the DatasetDefinition object.

Raises

DefinitionError – If any sanity checks fail.

get_attr(self, attribute_name)Any

Retrieves the value of the attribute requested, or None if it does not exist.

Parameters

attribute_name (str) – The name of the attribute to retrieve.

Returns

The value of the attribute, or None.

Return type

Any

get_variable_names(self)List[str]

Retrieves the list of variable names. Note that this excludes coordinate variables.

Returns

The list of variable names.

Return type

List[str]

get_variable(self, variable_name: str)tsdat.config.variable_definition.VariableDefinition

Attemps to retrieve the requested variable. First searches the data variables, then searches the coordinate variables. Returns None if no data or coordinate variables have been defined with the requested variable name.

Parameters

variable_name (str) – The name of the variable to retrieve.

Returns

Returns the VariableDefinition for the variable, or None if the variable could not be found.

Return type

VariableDefinition

get_coordinates(self, variable: tsdat.config.variable_definition.VariableDefinition)List[tsdat.config.variable_definition.VariableDefinition]

Returns the coordinate VariableDefinition object(s) that dimension the requested VariableDefinition.

Parameters

variable (VariableDefinition) – The VariableDefinition whose coordinate variables should be retrieved.

Returns

A list of VariableDefinition coordinate variables that dimension the provided VariableDefinition.

Return type

List[VariableDefinition]

get_static_variables(self)List[tsdat.config.variable_definition.VariableDefinition]

Retrieves a list of static VariableDefinition objects. A variable is defined as static if it has a “data” section in the config file, which would mean that the variable’s data is defined statically. For example, in the config file snippet below, “depth” is a static variable:

depth:
  data: [4, 8, 12]
  dims: [depth]
  type: int
  attrs:
    long_name: Depth
    units: m
Returns

The list of static VariableDefinition objects.

Return type

List[VariableDefinition]

class tsdat.config.QualityManagerDefinition(name: str, dictionary: Dict)

Wrapper for the quality_management portion of the pipeline config file.

Parameters
  • name (str) – The name of the quality manager in the config file.

  • dictionary (Dict) – The dictionary contents of the quality manager from the config file.