tsdat.utils

The tsdat.utils package provides helper classes for working with XArray datasets.

Package Contents

Classes

DSUtil

Provides helper functions for xarray.Dataset

Converter

Base class for converting data arrays from one units to another.

DefaultConverter

Default class for converting units on data arrays. This class utilizes

StringTimeConverter

Convert a time string to a np.datetime64, which is needed for xarray.

TimestampTimeConverter

Convert a numeric UTC timestamp to a np.datetime64, which is needed for

class tsdat.utils.DSUtil

Provides helper functions for xarray.Dataset

static record_corrections_applied(ds: xarray.Dataset, variable: str, correction: str)

Records a description of a correction made to a variable to the corrections_applied corresponding attribute.

Parameters
  • ds (xr.Dataset) – Dataset containing the corrected variable

  • variable (str) – The name of the variable that was corrected

  • correction (str) – A description of the correction

static datetime64_to_string(datetime64: numpy.datetime64)Tuple[str, str]

Convert a datetime64 object to formated string.

Parameters

datetime64 (Union[np.ndarray, np.datetime64]) – The datetime64 object

Returns

A tuple of strings representing the formatted date. The first string is the day in ‘yyyymmdd’ format. The second string is the time in ‘hhmmss’ format.

Return type

Tuple[str, str]

static datetime64_to_timestamp(variable_data: numpy.ndarray)numpy.ndarray

Converts each datetime64 value to a timestamp in same units as the variable (eg., seconds, nanoseconds).

Parameters

variable_data (np.ndarray) – ndarray of variable data

Returns

An ndarray of the same shape, with time values converted to long timestamps (e.g., int64)

Return type

np.ndarray

static get_datastream_name(ds: xarray.Dataset = None, config=None)str

Returns the datastream name defined in the dataset or in the provided pipeline configuration.

Parameters
  • ds (xr.Dataset, optional.) – The data as an xarray dataset; defaults to None

  • config (Config, optional) – The Config object used to assist reading time data from the raw_dataset; defaults to None.

Returns

The datastream name

Return type

str

static get_end_time(ds: xarray.Dataset)Tuple[str, str]

Convenience method to get the end date and time from a xarray dataset.

Parameters

ds (xr.Dataset) – The dataset

Returns

A tuple of [day, time] as formatted strings representing the last time point in the dataset.

Return type

Tuple[str, str]

static get_fill_value(ds: xarray.Dataset, variable_name: str)

Get the value of the _FillValue attribute for the given variable.

Parameters
  • ds (xr.Dataset) – The dataset

  • variable_name (str) – A variable in the dataset

Returns

The value of the _FillValue attr or None if it is not defined

Return type

same data type of the variable (int, float, etc.) or None

static get_non_qc_variable_names(ds: xarray.Dataset)List[str]

Get a list of all data variables in the dataset that are NOT qc variables.

Parameters

ds (xr.Dataset) – A dataset

Returns

List of non-qc data variable names

Return type

List[str]

static get_raw_end_time(raw_ds: xarray.Dataset, time_var_definition)Tuple[str, str]

Convenience method to get the end date and time from a raw xarray dataset. This uses time_var_definition.get_input_name() as the dataset key for the time variable and additionally uses the input’s Converter object if applicable.

Parameters
  • raw_ds (xr.Dataset) – A raw dataset (not standardized)

  • time_var_definition (VariableDefinition) – The ‘time’ variable definition from the pipeline config

Returns

A tuple of strings representing the last time data point in the dataset. The first string is the day in ‘yyyymmdd’ format. The second string is the time in ‘hhmmss’ format.

Return type

Tuple[str, str]

static get_raw_start_time(raw_ds: xarray.Dataset, time_var_definition)Tuple[str, str]

Convenience method to get the start date and time from a raw xarray dataset. This uses time_var_definition.get_input_name() as the dataset key for the time variable and additionally uses the input’s Converter object if applicable.

Parameters
  • raw_ds (xr.Dataset) – A raw dataset (not standardized)

  • time_var_definition (VariableDefinition) – The ‘time’ variable definition from the pipeline config

Returns

A tuple of strings representing the first time data point in the dataset. The first string is the day in ‘yyyymmdd’ format. The second string is the time in ‘hhmmss’ format.

Return type

Tuple[str, str]

static get_coordinate_variable_names(ds: xarray.Dataset)List[str]

Get a list of all coordinate variables in this dataset.

Parameters

ds (xr.Dataset) – The dataset

Returns

List of coordinate variable names

Return type

List[str]

static get_start_time(ds: xarray.Dataset)Tuple[str, str]

Convenience method to get the start date and time from a xarray dataset.

Parameters

ds (xr.Dataset) – A standardized dataset

Returns

A tuple of strings representing the first time data point in the dataset. The first string is the day in ‘yyyymmdd’ format. The second string is the time in ‘hhmmss’ format.

Return type

Tuple[str, str]

static get_metadata(ds: xarray.Dataset)Dict

Get a dictionary of all global and variable attributes in a dataset. Global atts are found under the ‘attributes’ key and variable atts are found under the ‘variables’ key.

Parameters

ds (xr.Dataset) – A dataset

Returns

A dictionary of global & variable attributes

Return type

Dict

static plot_qc(ds: xarray.Dataset, variable_name: str, filename: str = None, **kwargs)act.plotting.TimeSeriesDisplay

Create a QC plot for the given variable. This is based on the ACT library: https://arm-doe.github.io/ACT/source/auto_examples/plot_qc.html#sphx-glr-source-auto-examples-plot-qc-py

We provide a convenience wrapper method for basic QC plots of a variable, but we recommend to use ACT directly and look at their examples for more complex plots like plotting variables in two different datasets.

TODO: Depending on use cases, we will likely add more arguments to be able to quickly produce the most common types of QC plots.

Parameters
  • ds (xr.Dataset) – A dataset

  • variable_name (str) – The variable to plot

  • filename (str, optional) – The filename for the image. Saves the plot as this filename if provided.

static get_plot_filename(dataset: xarray.Dataset, plot_description: str, extension: str)str

Returns the filename for a plot according to MHKIT-Cloud Data standards. The dataset is used to determine the datastream_name and start date/time. The standards dictate that a plot filename should follow the format: datastream_name.date.time.description.extension.

Parameters
  • dataset (xr.Dataset) – The dataset from which the plot data is drawn from. This is used to collect the datastream_name and start date/time.

  • plot_description (str) – The description of the plot. Should be as brief as possible and contain no spaces. Underscores may be used.

  • extension (str) – The file extension for the plot.

Returns

The standardized plot filename.

Return type

str

static get_dataset_filename(dataset: xarray.Dataset, file_extension='.nc')str

Given an xarray dataset this function will return the base filename of the dataset according to MHkiT-Cloud data standards. The base filename does not include the directory structure where the file should be saved, only the name of the file itself, e.g. z05.ExampleBuoyDatastream.b1.20201230.000000.nc

Parameters
  • dataset (xr.Dataset) – The dataset whose filename should be generated.

  • file_extension (str, optional) – The file extension to use. Defaults to “.nc”

Returns

The base filename of the dataset.

Return type

str

static get_raw_filename(raw_dataset: xarray.Dataset, old_filename: str, config)str

Returns the appropriate raw filename of the raw dataset according to MHKIT-Cloud naming conventions. Uses the config object to parse the start date and time from the raw dataset for use in the new filename.

The new filename will follow the MHKIT-Cloud Data standards for raw filenames, ie: datastream_name.date.time.raw.old_filename, where the data level used in the datastream_name is 00.

Parameters
  • raw_dataset (xr.Dataset) – The raw data as an xarray dataset.

  • old_filename (str) – The name of the original raw file.

  • config (Config) – The Config object used to assist reading time data from the raw_dataset.

Returns

The standardized filename of the raw file.

Return type

str

static get_date_from_filename(filename: str)str

Given a filename that conforms to MHKiT-Cloud Data Standards, return the date of the first point of data in the file.

Parameters

filename (str) – The filename or path to the file.

Returns

The date, in “yyyymmdd.hhmmss” format.

Return type

str

static get_datastream_name_from_filename(filename: str)Optional[str]

Given a filename that conforms to MHKiT-Cloud Data Standards, return the datastream name. Datastream name is everything to the left of the third ‘.’ in the filename.

e.g., humboldt_ca.buoy_data.b1.20210120.000000.nc

Parameters

filename (str) – The filename or path to the file.

Returns

The datstream name, or None if filename is not in proper format.

Return type

Optional[str]

static get_datastream_directory(datastream_name: str, root: str = '')str

Given the datastream_name and an optional root, returns the path to where the datastream should be located. Does NOT create the directory where the datastream should be located.

Parameters
  • datastream_name (str) – The name of the datastream whose directory path should be generated.

  • root (str, optional) – The directory to use as the root of the directory structure. Defaults to None. Defaults to “”

Returns

The path to the directory where the datastream should be located.

Return type

str

static is_image(filename: str)bool

Detect the mimetype from the file extension and use it to determine if the file is an image or not

Parameters

filename (str) – The name of the file to check

Returns

True if the file extension matches an image mimetype

Return type

bool

class tsdat.utils.Converter(parameters: Union[Dict, None] = None)

Bases: abc.ABC

Base class for converting data arrays from one units to another. Users can extend this class if they have a special units conversion for their input data that cannot be resolved with the default converter classes.

Parameters

parameters (dict, optional) – A dictionary of converter-specific parameters which get passed from the pipeline config file. Defaults to {}

abstract run(self, data: numpy.ndarray, in_units: str, out_units: str)numpy.ndarray

Convert the input data from in_units to out_units.

Parameters
  • data (np.ndarray) – Data array to be modified.

  • in_units (str) – Current units of the data array.

  • out_units (str) – Units to be converted to.

Returns

Data array converted into the new units.

Return type

np.ndarray

class tsdat.utils.DefaultConverter(parameters: Union[Dict, None] = None)

Bases: Converter

Default class for converting units on data arrays. This class utilizes ACT.utils.data_utils.convert_units, and should work for most variables except time (see StringTimeConverter and TimestampTimeConverter)

run(self, data: numpy.ndarray, in_units: str, out_units: str)numpy.ndarray

Convert the input data from in_units to out_units.

Parameters
  • data (np.ndarray) – Data array to be modified.

  • in_units (str) – Current units of the data array.

  • out_units (str) – Units to be converted to.

Returns

Data array converted into the new units.

Return type

np.ndarray

class tsdat.utils.StringTimeConverter(parameters: Union[Dict, None] = None)

Bases: Converter

Convert a time string to a np.datetime64, which is needed for xarray. This class utilizes pd.to_datetime to perform the conversion.

One of the parameters should be ‘time_format’, which is the the strftime to parse time, eg “%d/%m/%Y”. Note that “%f” will parse all the way up to nanoseconds. See strftime documentation for more information on choices.

Parameters

parameters (dict, optional) – dictionary of converter-specific parameters. Defaults to {}.

run(self, data: numpy.ndarray, in_units: str, out_units: str)numpy.ndarray

Convert the input data from in_units to out_units.

Parameters
  • data (np.ndarray) – Data array to be modified.

  • in_units (str) – Current units of the data array.

  • out_units (str) – Units to be converted to.

Returns

Data array converted into the new units.

Return type

np.ndarray

class tsdat.utils.TimestampTimeConverter(parameters: Union[Dict, None] = None)

Bases: Converter

Convert a numeric UTC timestamp to a np.datetime64, which is needed for xarray. This class utilizes pd.to_datetime to perform the conversion.

One of the parameters should be ‘unit’. This parameter denotes the time unit (e.g., D,s,ms,us,ns), which is an integer or float number. The timestamp will be based off the unix epoch start.

Parameters

parameters (dict, optional) – A dictionary of converter-specific parameters which get passed from the pipeline config file. Defaults to {}

run(self, data: numpy.ndarray, in_units: str, out_units: str)numpy.ndarray

Convert the input data from in_units to out_units.

Parameters
  • data (np.ndarray) – Data array to be modified.

  • in_units (str) – Current units of the data array.

  • out_units (str) – Units to be converted to.

Returns

Data array converted into the new units.

Return type

np.ndarray