tsdat.utils.dsutils
¶
Classes¶
Provides helper functions for xarray.Dataset |
-
class
tsdat.utils.dsutils.
DSUtil
[source]¶ Provides helper functions for xarray.Dataset
Class Methods
Convert a datetime64 object to formated string.
Converts each datetime64 value to a timestamp in same units as
Get a list of all coordinate variables in this dataset.
Given an xarray dataset this function will return the base filename of
Given the datastream_name and an optional root, returns the path to
Returns the datastream name defined in the dataset or in the provided
Given a filename that conforms to MHKiT-Cloud Data Standards, return
Given a filename that conforms to MHKiT-Cloud Data Standards, return
Convenience method to get the end date and time from a xarray
Get the value of the _FillValue attribute
Get a dictionary of all global and variable
Get a list of all data variables in the dataset that
Returns the filename for a plot according to MHKIT-Cloud Data
Convenience method to get the end date and time from a raw xarray
Returns the appropriate raw filename of the raw dataset according to
Convenience method to get the start date and time from a raw xarray
Convenience method to get the start date and time from a xarray
Detect the mimetype from the file extension and use it to determine
Create a QC plot for the given variable. This is based on the ACT library:
Records a description of a correction made to a variable to the
Method Descriptions
-
static
datetime64_to_string
(datetime64: numpy.datetime64) → Tuple[str, str][source]¶ Convert a datetime64 object to formated string.
- Parameters
datetime64 (Union[np.ndarray, np.datetime64]) – The datetime64 object
- Returns
A tuple of strings representing the formatted date. The first string is the day in ‘yyyymmdd’ format. The second string is the time in ‘hhmmss’ format.
- Return type
Tuple[str, str]
-
static
datetime64_to_timestamp
(variable_data: numpy.ndarray) → numpy.ndarray[source]¶ Converts each datetime64 value to a timestamp in same units as the variable (eg., seconds, nanoseconds).
- Parameters
variable_data (np.ndarray) – ndarray of variable data
- Returns
An ndarray of the same shape, with time values converted to long timestamps (e.g., int64)
- Return type
np.ndarray
-
static
get_coordinate_variable_names
(ds: xarray.Dataset) → List[str][source]¶ Get a list of all coordinate variables in this dataset.
- Parameters
ds (xr.Dataset) – The dataset
- Returns
List of coordinate variable names
- Return type
List[str]
-
static
get_dataset_filename
(dataset: xarray.Dataset, file_extension='.nc') → str[source]¶ Given an xarray dataset this function will return the base filename of the dataset according to MHkiT-Cloud data standards. The base filename does not include the directory structure where the file should be saved, only the name of the file itself, e.g. z05.ExampleBuoyDatastream.b1.20201230.000000.nc
- Parameters
dataset (xr.Dataset) – The dataset whose filename should be generated.
file_extension (str, optional) – The file extension to use. Defaults to “.nc”
- Returns
The base filename of the dataset.
- Return type
str
-
static
get_datastream_directory
(datastream_name: str, root: str = '') → str[source]¶ Given the datastream_name and an optional root, returns the path to where the datastream should be located. Does NOT create the directory where the datastream should be located.
- Parameters
datastream_name (str) – The name of the datastream whose directory path should be generated.
root (str, optional) – The directory to use as the root of the directory structure. Defaults to None. Defaults to “”
- Returns
The path to the directory where the datastream should be located.
- Return type
str
-
static
get_datastream_name
(ds: xarray.Dataset = None, config=None) → str[source]¶ Returns the datastream name defined in the dataset or in the provided pipeline configuration.
- Parameters
ds (xr.Dataset, optional.) – The data as an xarray dataset; defaults to None
config (Config, optional) – The Config object used to assist reading time data from the raw_dataset; defaults to None.
- Returns
The datastream name
- Return type
str
-
static
get_datastream_name_from_filename
(filename: str) → Optional[str][source]¶ Given a filename that conforms to MHKiT-Cloud Data Standards, return the datastream name. Datastream name is everything to the left of the third ‘.’ in the filename.
e.g., humboldt_ca.buoy_data.b1.20210120.000000.nc
- Parameters
filename (str) – The filename or path to the file.
- Returns
The datstream name, or None if filename is not in proper format.
- Return type
Optional[str]
-
static
get_date_from_filename
(filename: str) → str[source]¶ Given a filename that conforms to MHKiT-Cloud Data Standards, return the date of the first point of data in the file.
- Parameters
filename (str) – The filename or path to the file.
- Returns
The date, in “yyyymmdd.hhmmss” format.
- Return type
str
-
static
get_end_time
(ds: xarray.Dataset) → Tuple[str, str][source]¶ Convenience method to get the end date and time from a xarray dataset.
- Parameters
ds (xr.Dataset) – The dataset
- Returns
A tuple of [day, time] as formatted strings representing the last time point in the dataset.
- Return type
Tuple[str, str]
-
static
get_fill_value
(ds: xarray.Dataset, variable_name: str)[source]¶ Get the value of the _FillValue attribute for the given variable.
- Parameters
ds (xr.Dataset) – The dataset
variable_name (str) – A variable in the dataset
- Returns
The value of the _FillValue attr or None if it is not defined
- Return type
same data type of the variable (int, float, etc.) or None
-
static
get_metadata
(ds: xarray.Dataset) → Dict[source]¶ Get a dictionary of all global and variable attributes in a dataset. Global atts are found under the ‘attributes’ key and variable atts are found under the ‘variables’ key.
- Parameters
ds (xr.Dataset) – A dataset
- Returns
A dictionary of global & variable attributes
- Return type
Dict
-
static
get_non_qc_variable_names
(ds: xarray.Dataset) → List[str][source]¶ Get a list of all data variables in the dataset that are NOT qc variables.
- Parameters
ds (xr.Dataset) – A dataset
- Returns
List of non-qc data variable names
- Return type
List[str]
-
static
get_plot_filename
(dataset: xarray.Dataset, plot_description: str, extension: str) → str[source]¶ Returns the filename for a plot according to MHKIT-Cloud Data standards. The dataset is used to determine the datastream_name and start date/time. The standards dictate that a plot filename should follow the format: datastream_name.date.time.description.extension.
- Parameters
dataset (xr.Dataset) – The dataset from which the plot data is drawn from. This is used to collect the datastream_name and start date/time.
plot_description (str) – The description of the plot. Should be as brief as possible and contain no spaces. Underscores may be used.
extension (str) – The file extension for the plot.
- Returns
The standardized plot filename.
- Return type
str
-
static
get_raw_end_time
(raw_ds: xarray.Dataset, time_var_definition: tsdat.VariableDefinition) → Tuple[str, str][source]¶ Convenience method to get the end date and time from a raw xarray dataset. This uses time_var_definition.get_input_name() as the dataset key for the time variable and additionally uses the input’s Converter object if applicable.
- Parameters
raw_ds (xr.Dataset) – A raw dataset (not standardized)
time_var_definition (VariableDefinition) – The ‘time’ variable definition from the pipeline config
- Returns
A tuple of strings representing the last time data point in the dataset. The first string is the day in ‘yyyymmdd’ format. The second string is the time in ‘hhmmss’ format.
- Return type
Tuple[str, str]
-
static
get_raw_filename
(raw_dataset: xarray.Dataset, old_filename: str, config) → str[source]¶ Returns the appropriate raw filename of the raw dataset according to MHKIT-Cloud naming conventions. Uses the config object to parse the start date and time from the raw dataset for use in the new filename.
The new filename will follow the MHKIT-Cloud Data standards for raw filenames, ie: datastream_name.date.time.raw.old_filename, where the data level used in the datastream_name is 00.
- Parameters
raw_dataset (xr.Dataset) – The raw data as an xarray dataset.
old_filename (str) – The name of the original raw file.
config (Config) – The Config object used to assist reading time data from the raw_dataset.
- Returns
The standardized filename of the raw file.
- Return type
str
-
static
get_raw_start_time
(raw_ds: xarray.Dataset, time_var_definition: tsdat.config.VariableDefinition) → Tuple[str, str][source]¶ Convenience method to get the start date and time from a raw xarray dataset. This uses time_var_definition.get_input_name() as the dataset key for the time variable and additionally uses the input’s Converter object if applicable.
- Parameters
raw_ds (xr.Dataset) – A raw dataset (not standardized)
time_var_definition (VariableDefinition) – The ‘time’ variable definition from the pipeline config
- Returns
A tuple of strings representing the first time data point in the dataset. The first string is the day in ‘yyyymmdd’ format. The second string is the time in ‘hhmmss’ format.
- Return type
Tuple[str, str]
-
static
get_start_time
(ds: xarray.Dataset) → Tuple[str, str][source]¶ Convenience method to get the start date and time from a xarray dataset.
- Parameters
ds (xr.Dataset) – A standardized dataset
- Returns
A tuple of strings representing the first time data point in the dataset. The first string is the day in ‘yyyymmdd’ format. The second string is the time in ‘hhmmss’ format.
- Return type
Tuple[str, str]
-
static
is_image
(filename: str) → bool[source]¶ Detect the mimetype from the file extension and use it to determine if the file is an image or not
- Parameters
filename (str) – The name of the file to check
- Returns
True if the file extension matches an image mimetype
- Return type
bool
-
static
plot_qc
(ds: xarray.Dataset, variable_name: str, filename: str = None, **kwargs) → act.plotting.TimeSeriesDisplay[source]¶ Create a QC plot for the given variable. This is based on the ACT library: https://arm-doe.github.io/ACT/source/auto_examples/plot_qc.html#sphx-glr-source-auto-examples-plot-qc-py
We provide a convenience wrapper method for basic QC plots of a variable, but we recommend to use ACT directly and look at their examples for more complex plots like plotting variables in two different datasets.
TODO: Depending on use cases, we will likely add more arguments to be able to quickly produce the most common types of QC plots.
- Parameters
ds (xr.Dataset) – A dataset
variable_name (str) – The variable to plot
filename (str, optional) – The filename for the image. Saves the plot as this filename if provided.
-
static
record_corrections_applied
(ds: xarray.Dataset, variable: str, correction: str)[source]¶ Records a description of a correction made to a variable to the corrections_applied corresponding attribute.
- Parameters
ds (xr.Dataset) – Dataset containing the corrected variable
variable (str) – The name of the variable that was corrected
correction (str) – A description of the correction
-
static