utils
Attributes#
DATASTREAM_TEMPLATE
module-attribute
#
DATASTREAM_TEMPLATE = Template(
"{location_id}.{dataset_name}[-{qualifier}][-{temporal}].{data_level}"
)
FILENAME_TEMPLATE
module-attribute
#
Classes#
ParameterizedClass #
Bases: BaseModel
Base class for any class that accepts 'parameters' as an argument.
Sets the default 'parameters' to {}. Subclasses of ParameterizedClass should override the 'parameters' properties to support custom required or optional arguments from configuration files.
StandardsType #
Functions#
assign_data #
Assigns the data to the specified variable in the dataset.
If the variable exists and it is a data variable, then the DataArray for the specified variable in the dataset will simply have its data replaced with the new numpy array. If the variable exists and it is a coordinate variable, then the data will replace the coordinate data. If the variable does not exist in the dataset then a KeyError will be raised.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
Dataset
|
The dataset where the data should be assigned. |
required |
data |
NDArray[Any]
|
The data to assign. |
required |
variable_name |
str
|
The name of the variable in the dataset to assign data to. |
required |
Raises:
Type | Description |
---|---|
KeyError
|
Raises a KeyError if the specified variable is not in the dataset's coords or data_vars dictionary. |
Returns:
Type | Description |
---|---|
Dataset
|
xr.Dataset: The dataset with data assigned to it. |
Source code in tsdat/utils.py
datetime_substitutions #
Source code in tsdat/utils.py
decode_cf #
Wrapper around xarray.decode_cf()
which handles additional edge cases.
This helps ensure that the dataset is formatted and encoded correctly after it has been constructed or modified. Handles edge cases for units and data type encodings on datetime variables.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
Dataset
|
The dataset to decode. |
required |
Returns:
Type | Description |
---|---|
Dataset
|
xr.Dataset: The decoded dataset. |
Source code in tsdat/utils.py
generate_schema #
generate_schema(
dir: Path = typer.Option(
Path(".vscode/schema/"),
file_okay=False,
dir_okay=True,
),
standards: StandardsType = typer.Option(
StandardsType.tsdat
),
)
Source code in tsdat/utils.py
get_datastream #
get_fields_from_dataset #
get_fields_from_datastream #
Extracts fields from the datastream.
WARNING: this only works for the default datastream template.
Source code in tsdat/utils.py
get_file_datetime_str #
Source code in tsdat/utils.py
get_filename #
Returns the standardized filename for the provided dataset.
Returns a key consisting of the dataset's datastream, starting date/time, the
extension, and an optional title. For file-based storage systems this method may be
used to generate the basename of the output data file by providing extension as
'.nc', '.csv', or some other file ending type. For ancillary plot files this can be
used in the same way by specifying extension as '.png', '.jpeg', etc and by
specifying the title, resulting in files named like
'
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
Dataset
|
The dataset (used to extract the datastream and starting / ending times). |
required |
extension |
str
|
The file extension that should be used. |
required |
title |
Optional[str]
|
An optional title that will be placed between the start time and the extension in the generated filename. |
None
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The filename constructed from provided parameters. |
Source code in tsdat/utils.py
get_start_date_and_time_str #
Gets the start date and start time strings from a Dataset.
The strings are formatted using strftime and the following formats
- date: "%Y%m%d"
- time: ""%H%M%S"
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
Dataset
|
The dataset whose start date and time should be retrieved. |
required |
Returns:
Type | Description |
---|---|
Tuple[str, str]
|
Tuple[str, str]: The start date and time as strings like "YYYYmmdd", "HHMMSS". |
Source code in tsdat/utils.py
get_start_time #
Gets the earliest 'time' value and returns it as a pandas Timestamp.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
Dataset
|
The dataset whose start time should be retrieved. |
required |
Returns:
Type | Description |
---|---|
Timestamp
|
pd.Timestamp: The timestamp of the earliest time value in the dataset. |
Source code in tsdat/utils.py
model_to_dict #
Converts the model to a dict with unset optional properties excluded.
Performs a nested union on the dictionaries produced by setting the exclude_unset
and exclude_none
options to True for the model.dict()
method. This allows for
the preservation of explicit None
values in the yaml, while still filtering out
values that default to None
.
Borrowed approximately from https://stackoverflow.com/a/65363852/15641512.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
BaseModel
|
The pydantic model to dict-ify. |
required |
Returns:
Type | Description |
---|---|
Dict[Any, Any]
|
Dict[Any, Any]: The model as a dictionary. |
Source code in tsdat/utils.py
record_corrections_applied #
Records the message on the 'corrections_applied' attribute.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
Dataset
|
The corrected dataset. |
required |
variable_name |
str
|
The name of the variable in the dataset. |
required |
message |
str
|
The message to record. |
required |