`tsdat`¶

Package Contents¶

Classes¶

`Config`	Wrapper for the pipeline configuration file.
`Keys`	Class that provides a handle for keys in the pipeline config file.
`QualityManagerDefinition`	Wrapper for the quality_management portion of the pipeline config
`DatastreamStorage`	DatastreamStorage is the base class for providing
`FilesystemStorage`	Datastreamstorage subclass for a local Linux-based filesystem.
`AwsStorage`	DatastreamStorage subclass for an AWS S3-based filesystem.
`Pipeline`	This class serves as the base class for all tsdat data pipelines.
`IngestPipeline`	The IngestPipeline class is designed to read in raw, non-standardized

class tsdat.Config(dictionary: Dict)¶

Wrapper for the pipeline configuration file.

Note: in most cases, Config.load(filepath) should be used to instantiate the Config class.

Parameters: dictionary (Dict) – The pipeline configuration file as a dictionary.

_parse_quality_managers(self, dictionary: Dict) → Dict[str, tsdat.config.quality_manager_definition.QualityManagerDefinition]¶

Extracts QualityManagerDefinitions from the config file.

Parameters: dictionary (Dict) – The quality_management dictionary.
Returns: Mapping of quality manager name to QualityManagerDefinition
Return type: Dict[str, QualityManagerDefinition]

classmethod load(self, filepaths: List[str])¶

Load one or more yaml pipeline configuration files. Multiple files should only be passed as input if the pipeline configuration file is split across multiple files.

Parameters: filepaths (List[str]) – The path(s) to yaml configuration files to load.
Returns: A Config object wrapping the yaml configuration file(s).
Return type: Config

static lint_yaml(filename: str)¶

Lints a yaml file and raises an exception if an error is found.

Parameters: filename (str) – The path to the file to lint.
Raises: Exception – Raises an exception if an error is found.

class tsdat.Keys¶

Class that provides a handle for keys in the pipeline config file.

PIPELINE = pipeline¶

DATASET_DEFINITION = dataset_definition¶

DEFAULTS = variable_defaults¶

QUALITY_MANAGEMENT = quality_management¶

ATTRIBUTES = attributes¶

DIMENSIONS = dimensions¶

VARIABLES = variables¶

ALL = ALL¶

class tsdat.QualityManagerDefinition(name: str, dictionary: Dict)¶

Wrapper for the quality_management portion of the pipeline config file.

Parameters

name (str) – The name of the quality manager in the config file.
dictionary (Dict) – The dictionary contents of the quality manager from the config file.

class tsdat.DatastreamStorage(parameters={})¶

Bases: abc.ABC

DatastreamStorage is the base class for providing access to processed data files in a persistent archive. DatastreamStorage provides shortcut methods to find files based upon date, datastream name, file type, etc. This is the class that should be used to save and retrieve processed data files. Use the DatastreamStorage.from_config() method to construct the appropriate subclass instance based upon a storage config file.

default_file_type¶

file_filters¶

output_file_extensions¶

static from_config(storage_config_file: str)¶

Load a yaml config file which provides the storage constructor parameters.

Parameters: storage_config_file (str) – The path to the config file to load
Returns: A subclass instance created from the config file.
Return type: DatastreamStorage

property tmp(self)¶

Each subclass should define the tmp property, which provides access to a TemporaryStorage object that is used to efficiently handle reading/writing temporary files used during the processing pipeline, or to perform fileystem actions on files other than processed datastream files that reside in the same filesystem as the DatastreamStorage. Is is not intended to be used outside of the pipeline.

Raises: NotImplementedError – [description]

abstract find(self, datastream_name: str, start_time: str, end_time: str, filetype: str = None) → List[str]¶

Finds all files of the given type from the datastream store with the given datastream_name and timestamps from start_time (inclusive) up to end_time (exclusive). Returns a list of paths to files that match the criteria.

Parameters

datastream_name (str) – The datastream_name as defined by ME Data Standards.
start_time (str) – The start time or date to start searching for data (inclusive). Should be like “20210106.000000” to search for data beginning on or after January 6th, 2021.
end_time (str) – The end time or date to stop searching for data (exclusive). Should be like “20210108.000000” to search for data ending before January 8th, 2021.
filetype (str, optional) – A file type from the DatastreamStorage.file_filters keys If no type is specified, all files will be returned. Defaults to None.

Returns

A list of paths in datastream storage in ascending order

Return type

List[str]

abstract fetch(self, datastream_name: str, start_time: str, end_time: str, local_path: str = None, filetype: int = None)¶

Fetches files from the datastream store using the datastream_name, start_time, and end_time to specify the file(s) to retrieve. If the local path is not specified, it is up to the subclass to determine where to put the retrieved file(s).

Parameters

datastream_name (str) – The datastream_name as defined by ME Data Standards.
start_time (str) – The start time or date to start searching for data (inclusive). Should be like “20210106” to search for data beginning on or after January 6th, 2021.
end_time (str) – The end time or date to stop searching for data (exclusive). Should be like “20210108” to search for data ending before January 8th, 2021.
local_path (str, optional) – The path to the directory where the data should be stored. Defaults to None.
filetype (int, optional) – A file type from the DatastreamStorage.file_filters keys If no type is specified, all files will be returned. Defaults to None.

Returns

A list of paths where the retrieved files were stored in local storage. This is a context manager class, so it this method should be called via the ‘with’ statement and all files referenced by the list will be cleaned up when it goes out of scope.

Return type

DisposableLocalTempFileList:

save(self, dataset_or_path: Union[str, xarray.Dataset], new_filename: str = None) → List[Any]¶

Saves a local file to the datastream store.

Parameters

dataset_or_path (Union[str, xr.Dataset]) – The dataset or local path to the file to save. The file should be named according to ME Data Standards naming conventions so that this method can automatically parse the datastream, date, and time from the file name.
new_filename (str, optional) – If provided, the new filename to save as. This parameter should ONLY be provided if using a local path for dataset_or_path. Must also follow ME Data Standards naming conventions. Defaults to None.

Returns

A list of paths where the saved files were stored in storage. Path type is dependent upon the specific storage subclass.

Return type

List[Any]

abstract save_local_path(self, local_path: str, new_filename: str = None) → Any¶

Given a path to a local file, save that file to the storage.

Parameters

local_path (str) – Local path to the file to save. The file should be named according to ME Data Standards naming conventions so that this method can automatically parse the datastream, date, and time from the file name.
new_filename (str, optional) – If provided, the new filename to save as. This parameter should ONLY be provided if using a local path for dataset_or_path. Must also follow ME Data Standards naming conventions. Defaults to None.

Returns

The path where this file was stored in storage. Path type is dependent upon the specific storage subclass.

Return type

Any

abstract exists(self, datastream_name: str, start_time: str, end_time: str, filetype: str = None) → bool¶

Checks if any data exists in the datastream store for the provided datastream and time range.

Parameters

datastream_name (str) – The datastream_name as defined by ME Data Standards.
start_time (str) – The start time or date to start searching for data (inclusive). Should be like “20210106” to search for data beginning on or after January 6th, 2021.
end_time (str) – The end time or date to stop searching for data (exclusive). Should be like “20210108” to search for data ending before January 8th, 2021.
filetype (str, optional) – A file type from the DatastreamStorage.file_filters keys. If none specified, all files will be checked. Defaults to None.

Returns

True if data exists, False otherwise.

Return type

bool

abstract delete(self, datastream_name: str, start_time: str, end_time: str, filetype: str = None) → None¶

Deletes datastream data in the datastream store in between the specified time range.

Parameters

datastream_name (str) – The datastream_name as defined by ME Data Standards.
start_time (str) – The start time or date to start searching for data (inclusive). Should be like “20210106” to search for data beginning on or after January 6th, 2021.
end_time (str) – The end time or date to stop searching for data (exclusive). Should be like “20210108” to search for data ending before January 8th, 2021.
filetype (str, optional) – A file type from the DatastreamStorage.file_filters keys. If no type is specified, all files will be deleted. Defaults to None.

class tsdat.FilesystemStorage(parameters={})¶

Bases: tsdat.io.DatastreamStorage

Datastreamstorage subclass for a local Linux-based filesystem.

TODO: rename to LocalStorage as this is more intuitive.

Parameters

parameters (dict, optional) –

Dictionary of parameters that should be set automatically from the storage config file when this class is intantiated via the DatstreamStorage.from-config() method. Defaults to {}

Key parameters that should be set in the config file include

retain_input_files: Whether the input files should be cleaned up after they are done processing
root_dir: The root path under which processed files will e stored.

property tmp(self)¶

Each subclass should define the tmp property, which provides access to a TemporaryStorage object that is used to efficiently handle reading/writing temporary files used during the processing pipeline, or to perform fileystem actions on files other than processed datastream files that reside in the same filesystem as the DatastreamStorage. Is is not intended to be used outside of the pipeline.

Raises: NotImplementedError – [description]

find(self, datastream_name: str, start_time: str, end_time: str, filetype: str = None) → List[str]¶

Finds all files of the given type from the datastream store with the given datastream_name and timestamps from start_time (inclusive) up to end_time (exclusive). Returns a list of paths to files that match the criteria.

Parameters

datastream_name (str) – The datastream_name as defined by ME Data Standards.
start_time (str) – The start time or date to start searching for data (inclusive). Should be like “20210106.000000” to search for data beginning on or after January 6th, 2021.
end_time (str) – The end time or date to stop searching for data (exclusive). Should be like “20210108.000000” to search for data ending before January 8th, 2021.
filetype (str, optional) – A file type from the DatastreamStorage.file_filters keys If no type is specified, all files will be returned. Defaults to None.

Returns

A list of paths in datastream storage in ascending order

Return type

List[str]

fetch(self, datastream_name: str, start_time: str, end_time: str, local_path: str = None, filetype: int = None) → tsdat.io.DisposableLocalTempFileList¶

Fetches files from the datastream store using the datastream_name, start_time, and end_time to specify the file(s) to retrieve. If the local path is not specified, it is up to the subclass to determine where to put the retrieved file(s).

Parameters

datastream_name (str) – The datastream_name as defined by ME Data Standards.
start_time (str) – The start time or date to start searching for data (inclusive). Should be like “20210106” to search for data beginning on or after January 6th, 2021.
end_time (str) – The end time or date to stop searching for data (exclusive). Should be like “20210108” to search for data ending before January 8th, 2021.
local_path (str, optional) – The path to the directory where the data should be stored. Defaults to None.
filetype (int, optional) – A file type from the DatastreamStorage.file_filters keys If no type is specified, all files will be returned. Defaults to None.

Returns

A list of paths where the retrieved files were stored in local storage. This is a context manager class, so it this method should be called via the ‘with’ statement and all files referenced by the list will be cleaned up when it goes out of scope.

Return type

DisposableLocalTempFileList:

save_local_path(self, local_path: str, new_filename: str = None) → Any¶

Given a path to a local file, save that file to the storage.

Parameters

local_path (str) – Local path to the file to save. The file should be named according to ME Data Standards naming conventions so that this method can automatically parse the datastream, date, and time from the file name.
new_filename (str, optional) – If provided, the new filename to save as. This parameter should ONLY be provided if using a local path for dataset_or_path. Must also follow ME Data Standards naming conventions. Defaults to None.

Returns

The path where this file was stored in storage. Path type is dependent upon the specific storage subclass.

Return type

Any

exists(self, datastream_name: str, start_time: str, end_time: str, filetype: int = None) → bool¶

Checks if any data exists in the datastream store for the provided datastream and time range.

Parameters

datastream_name (str) – The datastream_name as defined by ME Data Standards.
start_time (str) – The start time or date to start searching for data (inclusive). Should be like “20210106” to search for data beginning on or after January 6th, 2021.
end_time (str) – The end time or date to stop searching for data (exclusive). Should be like “20210108” to search for data ending before January 8th, 2021.
filetype (str, optional) – A file type from the DatastreamStorage.file_filters keys. If none specified, all files will be checked. Defaults to None.

Returns

True if data exists, False otherwise.

Return type

bool

delete(self, datastream_name: str, start_time: str, end_time: str, filetype: int = None) → None¶

Deletes datastream data in the datastream store in between the specified time range.

Parameters

datastream_name (str) – The datastream_name as defined by ME Data Standards.
start_time (str) – The start time or date to start searching for data (inclusive). Should be like “20210106” to search for data beginning on or after January 6th, 2021.
end_time (str) – The end time or date to stop searching for data (exclusive). Should be like “20210108” to search for data ending before January 8th, 2021.
filetype (str, optional) – A file type from the DatastreamStorage.file_filters keys. If no type is specified, all files will be deleted. Defaults to None.

class tsdat.AwsStorage(parameters={})¶

Bases: tsdat.io.DatastreamStorage

DatastreamStorage subclass for an AWS S3-based filesystem.

Parameters

parameters (dict, optional) –

Dictionary of parameters that should be set automatically from the storage config file when this class is intantiated via the DatstreamStorage.from-config() method. Defaults to {}

Key parameters that should be set in the config file include

retain_input_files: Whether the input files should be cleaned up after they are done processing
root_dir: The bucket ‘key’ to use to prepend to all processed files created in the persistent store. Defaults to ‘root’
temp_dir: The bucket ‘key’ to use to prepend to all temp files created in the S3 bucket. Defaults to ‘temp’
bucket_name: The name of the S3 bucket to store to

property s3_resource(self)¶

property s3_client(self)¶

property tmp(self)¶

Each subclass should define the tmp property, which provides access to a TemporaryStorage object that is used to efficiently handle reading/writing temporary files used during the processing pipeline, or to perform fileystem actions on files other than processed datastream files that reside in the same filesystem as the DatastreamStorage. Is is not intended to be used outside of the pipeline.

Raises: NotImplementedError – [description]

property root(self)¶

property temp_path(self)¶

find(self, datastream_name: str, start_time: str, end_time: str, filetype: str = None) → List[S3Path]¶

Finds all files of the given type from the datastream store with the given datastream_name and timestamps from start_time (inclusive) up to end_time (exclusive). Returns a list of paths to files that match the criteria.

Parameters

datastream_name (str) – The datastream_name as defined by ME Data Standards.
start_time (str) – The start time or date to start searching for data (inclusive). Should be like “20210106.000000” to search for data beginning on or after January 6th, 2021.
end_time (str) – The end time or date to stop searching for data (exclusive). Should be like “20210108.000000” to search for data ending before January 8th, 2021.
filetype (str, optional) – A file type from the DatastreamStorage.file_filters keys If no type is specified, all files will be returned. Defaults to None.

Returns

A list of paths in datastream storage in ascending order

Return type

List[str]

fetch(self, datastream_name: str, start_time: str, end_time: str, local_path: str = None, filetype: int = None) → tsdat.io.DisposableLocalTempFileList¶

Fetches files from the datastream store using the datastream_name, start_time, and end_time to specify the file(s) to retrieve. If the local path is not specified, it is up to the subclass to determine where to put the retrieved file(s).

Parameters

datastream_name (str) – The datastream_name as defined by ME Data Standards.
start_time (str) – The start time or date to start searching for data (inclusive). Should be like “20210106” to search for data beginning on or after January 6th, 2021.
end_time (str) – The end time or date to stop searching for data (exclusive). Should be like “20210108” to search for data ending before January 8th, 2021.
local_path (str, optional) – The path to the directory where the data should be stored. Defaults to None.
filetype (int, optional) – A file type from the DatastreamStorage.file_filters keys If no type is specified, all files will be returned. Defaults to None.

Returns

A list of paths where the retrieved files were stored in local storage. This is a context manager class, so it this method should be called via the ‘with’ statement and all files referenced by the list will be cleaned up when it goes out of scope.

Return type

DisposableLocalTempFileList:

save_local_path(self, local_path: str, new_filename: str = None)¶

Given a path to a local file, save that file to the storage.

Parameters

local_path (str) – Local path to the file to save. The file should be named according to ME Data Standards naming conventions so that this method can automatically parse the datastream, date, and time from the file name.
new_filename (str, optional) – If provided, the new filename to save as. This parameter should ONLY be provided if using a local path for dataset_or_path. Must also follow ME Data Standards naming conventions. Defaults to None.

Returns

The path where this file was stored in storage. Path type is dependent upon the specific storage subclass.

Return type

Any

exists(self, datastream_name: str, start_time: str, end_time: str, filetype: int = None) → bool¶

Checks if any data exists in the datastream store for the provided datastream and time range.

Parameters

datastream_name (str) – The datastream_name as defined by ME Data Standards.
start_time (str) – The start time or date to start searching for data (inclusive). Should be like “20210106” to search for data beginning on or after January 6th, 2021.
end_time (str) – The end time or date to stop searching for data (exclusive). Should be like “20210108” to search for data ending before January 8th, 2021.
filetype (str, optional) – A file type from the DatastreamStorage.file_filters keys. If none specified, all files will be checked. Defaults to None.

Returns

True if data exists, False otherwise.

Return type

bool

delete(self, datastream_name: str, start_time: str, end_time: str, filetype: int = None) → None¶

Deletes datastream data in the datastream store in between the specified time range.

Parameters

datastream_name (str) – The datastream_name as defined by ME Data Standards.
start_time (str) – The start time or date to start searching for data (inclusive). Should be like “20210106” to search for data beginning on or after January 6th, 2021.
end_time (str) – The end time or date to stop searching for data (exclusive). Should be like “20210108” to search for data ending before January 8th, 2021.
filetype (str, optional) – A file type from the DatastreamStorage.file_filters keys. If no type is specified, all files will be deleted. Defaults to None.

class tsdat.Pipeline(pipeline_config: Union[str, tsdat.config.Config], storage_config: Union[str, tsdat.io.DatastreamStorage])¶

Bases: abc.ABC

This class serves as the base class for all tsdat data pipelines.

Parameters

pipeline_config (Union[str, Config]) – The pipeline config file. Can be either a config object, or the path to the pipeline config file that should be used with this pipeline.
storage_config (Union[str, DatastreamStorage]) – The storage config file. Can be either a config object, or the path to the storage config file that should be used with this pipeline.

abstract run(self, filepath: Union[str, List[str]])¶

This method is the entry point for the pipeline. It will take one or more file paths and process them from start to finish. All classes extending the Pipeline class must implement this method.

Parameters: filepath (Union[str, List[str]]) – The path or list of paths to the file(s) to run the pipeline on.

standardize_dataset(self, raw_mapping: Dict[str, xarray.Dataset]) → xarray.Dataset¶

Standardizes the dataset by applying variable name and units conversions as defined by the pipeline config file. This method returns the standardized dataset.

Parameters: raw_mapping (Dict[str, xr.Dataset]) – The raw dataset mapping.
Returns: The standardized dataset.
Return type: xr.Dataset

check_required_variables(self, dataset: xarray.Dataset, dod: tsdat.config.DatasetDefinition)¶

Function to throw an error if a required variable could not be retrieved.

Parameters

dataset (xr.Dataset) – The dataset to check.
dod (DatasetDefinition) – The DatasetDefinition used to specify required variables.

Raises

Exception – Raises an exception to indicate the variable could not be retrieved.

add_static_variables(self, dataset: xarray.Dataset, dod: tsdat.config.DatasetDefinition) → xarray.Dataset¶

Uses the DatasetDefinition to add static variables (variables whose data are defined in the pipeline config file) to the output dataset.

Parameters

dataset (xr.Dataset) – The dataset to add static variables to.
dod (DatasetDefinition) – The DatasetDefinition to pull data from.

Returns

The original dataset with added variables from the config

Return type

xr.Dataset

add_missing_variables(self, dataset: xarray.Dataset, dod: tsdat.config.DatasetDefinition) → xarray.Dataset¶

Uses the dataset definition to initialize variables that are defined in the dataset definiton but did not have input. Uses the appropriate shape and _FillValue to initialize each variable.

Parameters

dataset (xr.Dataset) – The dataset to add the variables to.
dod (DatasetDefinition) – The DatasetDefinition to use.

Returns

The original dataset with variables that still need to be initialized, initialized.

Return type

xr.Dataset

add_attrs(self, dataset: xarray.Dataset, raw_mapping: Dict[str, xarray.Dataset], dod: tsdat.config.DatasetDefinition) → xarray.Dataset¶

Adds global and variable-level attributes to the dataset from the DatasetDefinition object.

Parameters

dataset (xr.Dataset) – The dataset to add attributes to.
raw_mapping (Dict[str, xr.Dataset]) – The raw dataset mapping. Used to set the input_files global attribute.
dod (DatasetDefinition) – The DatasetDefinition containing the attributes to add.

Returns

The original dataset with the attributes added.

Return type

xr.Dataset

get_previous_dataset(self, dataset: xarray.Dataset) → xarray.Dataset¶

Utility method to retrieve the previous set of data for hte same datastream as the provided dataset from the DatastreamStorage.

Parameters: dataset (xr.Dataset) – The reference dataset that will be used to search the DatastreamStore for prior data.
Returns: The previous dataset from the DatastreamStorage if it exists, otherwise None.
Return type: xr.Dataset

reduce_raw_datasets(self, raw_mapping: Dict[str, xarray.Dataset], definition: tsdat.config.DatasetDefinition) → List[xarray.Dataset]¶

Removes unused variables from each raw dataset in the raw mapping and performs input to output naming and unit conversions as defined in the dataset definition.

Parameters

raw_mapping (Dict[str, xr.Dataset]) – The raw xarray dataset mapping.
definition (DatasetDefinition) – The DatasetDefinition used to select the variables to keep.

Returns

A list of reduced datasets.

Return type

List[xr.Dataset]

reduce_raw_dataset(self, raw_dataset: xarray.Dataset, variable_definitions: List[tsdat.config.VariableDefinition], definition: tsdat.config.DatasetDefinition) → xarray.Dataset¶

Removes unused variables from the raw dataset provided and keeps only the variables and coordinates pertaining to the provdided variable definitions. Also performs input to output naming and unit conversions as defined in the DatasetDefinition.

Parameters

raw_dataset (xr.Dataset) – The raw dataset mapping.
variable_definitions (List[VariableDefinition]) – List of variables to keep.
definition (DatasetDefinition) – The DatasetDefinition used to select the variables to keep.

Returns

The reduced dataset.

Return type

xr.Dataset

store_and_reopen_dataset(self, dataset: xarray.Dataset) → xarray.Dataset¶

Uses the DatastreamStorage object to persist the dataset in the format specified by the storage config file.

Parameters: dataset (xr.Dataset) – The dataset to store.
Returns: The dataset after it has been saved to disk and reopened.
Return type: xr.Dataset

class tsdat.IngestPipeline(pipeline_config: Union[str, tsdat.config.Config], storage_config: Union[str, tsdat.io.DatastreamStorage])¶

Bases: tsdat.pipeline.pipeline.Pipeline

The IngestPipeline class is designed to read in raw, non-standardized data and convert it to a standardized format by embedding metadata, applying quality checks and quality controls, and by saving the now-processed data in a standard file format.

run(self, filepath: Union[str, List[str]]) → None¶

Runs the IngestPipeline from start to finish.

Parameters: filepath (Union[str, List[str]]) – The path or list of paths to the file(s) to run the pipeline on.

hook_customize_dataset(self, dataset: xarray.Dataset, raw_mapping: Dict[str, xarray.Dataset]) → xarray.Dataset¶

Hook to allow for user customizations to the standardized dataset such as inserting a derived variable based on other variables in the dataset. This method is called immediately after the standardize_dataset method and before QualityManagement has been run.

Parameters

dataset (xr.Dataset) – The dataset to customize.
raw_mapping (Dict[str, xr.Dataset]) – The raw dataset mapping.

Returns

The customized dataset.

Return type

xr.Dataset

hook_customize_raw_datasets(self, raw_dataset_mapping: Dict[str, xarray.Dataset]) → Dict[str, xarray.Dataset]¶

Hook to allow for user customizations to one or more raw xarray Datasets before they merged and used to create the standardized dataset. The raw_dataset_mapping will contain one entry for each file being used as input to the pipeline. The keys are the standardized raw file name, and the values are the datasets.

This method would typically only be used if the user is combining multiple files into a single dataset. In this case, this method may be used to correct coordinates if they don’t match for all the files, or to change variable (column) names if two files have the same name for a variable, but they are two distinct variables.

This method can also be used to check for unique conditions in the raw data that should cause a pipeline failure if they are not met.

This method is called before the inputs are merged and converted to standard format as specified by the config file.

Parameters: raw_dataset_mapping (Dict[str, xr.Dataset]) – The raw datasets to customize.
Returns: The customized raw datasets.
Return type: Dict[str, xr.Dataset]

hook_finalize_dataset(self, dataset: xarray.Dataset) → xarray.Dataset¶

Hook to apply any final customizations to the dataset before it is saved. This hook is called after QualityManagement has been run and immediately before the dataset it saved to file.

Parameters: dataset (xr.Dataset) – The dataset to finalize.
Returns: The finalized dataset to save.
Return type: xr.Dataset

hook_generate_and_persist_plots(self, dataset: xarray.Dataset) → None¶

Hook to allow users to create plots from the xarray dataset after the dataset has been finalized and just before the dataset is saved to disk.

To save on filesystem space (which is limited when running on the cloud via a lambda function), this method should only write one plot to local storage at a time. An example of how this could be done is below:

filename = DSUtil.get_plot_filename(dataset, "sea_level", "png")
with self.storage._tmp.get_temp_filepath(filename) as tmp_path:
    fig, ax = plt.subplots(figsize=(10,5))
    ax.plot(dataset["time"].data, dataset["sea_level"].data)
    fig.save(tmp_path)
    storage.save(tmp_path)

filename = DSUtil.get_plot_filename(dataset, "qc_sea_level", "png")
with self.storage._tmp.get_temp_filepath(filename) as tmp_path:
    fig, ax = plt.subplots(figsize=(10,5))
    DSUtil.plot_qc(dataset, "sea_level", tmp_path)
    storage.save(tmp_path)

Parameters: dataset (xr.Dataset) – The xarray dataset with customizations and QualityManagement applied.

read_and_persist_raw_files(self, file_paths: List[str]) → List[str]¶

Renames the provided raw files according to ME Data Standards file naming conventions for raw data files, and returns a list of the paths to the renamed files.

Parameters: file_paths (List[str]) – A list of paths to the original raw files.
Returns: A list of paths to the renamed files.
Return type: List[str]

exception tsdat.QCError¶

Bases: Exception

Indicates that a given Quality Manager failed with a fatal error.

exception tsdat.DefinitionError¶

Bases: Exception

Indicates a fatal error within the YAML Dataset Definition.

tsdat¶

Subpackages¶

Package Contents¶

Classes¶

`tsdat`¶