tsdat.io.base

Classes

DataConverter

Base class for running data conversions on retrieved raw dataset.

DataHandler

Class that groups a DataReader subclass and a DataWriter subclass together to

DataReader

Base class for reading data from an input source.

DataWriter

Base class for writing data to storage area(s).

FileHandler

DataHandler specifically tailored to reading and writing files of a specific type.

FileWriter

Base class for file-based DataWriters.

Retriever

Base class for retrieving data used as input to tsdat pipelines.

Storage

Abstract base class for the tsdat Storage API. Subclasses of Storage are used in

class tsdat.io.base.DataConverter[source]

Bases: tsdat.utils.ParameterizedClass, abc.ABC

Base class for running data conversions on retrieved raw dataset.

Class Methods

convert

Runs the data converter on the provided (retrieved) dataset.

Method Descriptions

abstract convert(self, dataset: xarray.Dataset, dataset_config: tsdat.config.dataset.DatasetConfig, variable_name: str, **kwargs: Any)xarray.Dataset[source]

Runs the data converter on the provided (retrieved) dataset.

Parameters
  • dataset (xr.Dataset) – The dataset to convert.

  • dataset_config (DatasetConfig) – The dataset configuration.

  • variable_name (str) – The name of the variable to convert.

Returns

The converted dataset.

Return type

xr.Dataset

class tsdat.io.base.DataHandler[source]

Bases: tsdat.utils.ParameterizedClass

Class that groups a DataReader subclass and a DataWriter subclass together to provide a unified approach to data I/O.

Parameters
  • reader (DataReader) – The DataReader subclass responsible for handling input data

  • reading.

  • writer (FileWriter) – The FileWriter subclass responsible for handling output

  • writing. (data) –

parameters :Any[source]
reader :DataReader[source]
writer :DataWriter[source]
class tsdat.io.base.DataReader[source]

Bases: tsdat.utils.ParameterizedClass, abc.ABC

Base class for reading data from an input source.

Parameters
  • regex (Pattern[str]) – The regex pattern associated with the DataReader. If

  • the DataReader from a tsdat pipeline (calling) –

  • pattern will be checked (this) –

  • each possible input key before the read() method is called. (against) –

Class Methods

read

Uses the input key to open a resource and load data as a xr.Dataset object or as

Method Descriptions

abstract read(self, input_key: str)Union[xarray.Dataset, Dict[str, xarray.Dataset]][source]

Uses the input key to open a resource and load data as a xr.Dataset object or as a mapping of strings to xr.Dataset objects.

In most cases DataReaders will only need to return a single xr.Dataset, but occasionally some types of inputs necessitate that the data loaded from the input_key be returned as a mapping. For example, if the input_key is a path to a zip file containing multiple disparate datasets, then returning a mapping is appropriate.

Parameters
  • input_key (str) – An input key matching the DataReader’s regex pattern that

  • be used to load data. (should) –

Returns

The raw data extracted from the provided input key.

Return type

Union[xr.Dataset, Dict[str, xr.Dataset]]

class tsdat.io.base.DataWriter[source]

Bases: tsdat.utils.ParameterizedClass, abc.ABC

Base class for writing data to storage area(s).

Class Methods

write

Writes the dataset to the storage area. This method is typically called by

Method Descriptions

abstract write(self, dataset: xarray.Dataset, **kwargs: Any)None[source]

Writes the dataset to the storage area. This method is typically called by the tsdat storage API, which will be responsible for providing any additional parameters required by subclasses of the tsdat.io.base.DataWriter class.

Parameters

dataset (xr.Dataset) – The dataset to save.

class tsdat.io.base.FileHandler[source]

Bases: DataHandler

DataHandler specifically tailored to reading and writing files of a specific type.

Parameters
  • reader (DataReader) – The DataReader subclass responsible for handling input data

  • reading.

  • writer (FileWriter) – The FileWriter subclass responsible for handling output

  • writing. (data) –

reader :DataReader[source]
writer :FileWriter[source]
class tsdat.io.base.FileWriter[source]

Bases: DataWriter, abc.ABC

Base class for file-based DataWriters.

Parameters
  • file_extension (str) – The file extension that the FileHandler should be used

  • for

  • e.g.

  • ".nc"

  • ".csv"

  • ..

file_extension :str[source]

Class Methods

write

Writes the dataset to the provided filepath. This method is typically called by

Method Descriptions

abstract write(self, dataset: xarray.Dataset, filepath: Optional[pathlib.Path] = None, **kwargs: Any)None[source]

Writes the dataset to the provided filepath. This method is typically called by the tsdat storage API, which will be responsible for providing the filepath, including the file extension.

Parameters
  • dataset (xr.Dataset) – The dataset to save.

  • filepath (Optional[Path]) – The path to the file to save.

class tsdat.io.base.Retriever[source]

Bases: tsdat.utils.ParameterizedClass, abc.ABC

Base class for retrieving data used as input to tsdat pipelines.

Parameters
  • readers (Dict[str, DataReader]) – The mapping of readers that should be used to

  • data given input_keys and optional keyword arguments provided by (retrieve) –

  • of Retriever. (subclasses) –

readers :Dict[Pattern, Any][source]

Mapping of readers that should be used to read data given input keys.

Class Methods

retrieve

Prepares the raw dataset mapping for use in downstream pipeline processes by

Method Descriptions

abstract retrieve(self, input_keys: List[str], dataset_config: tsdat.config.dataset.DatasetConfig, **kwargs: Any)xarray.Dataset[source]

Prepares the raw dataset mapping for use in downstream pipeline processes by consolidating the data into a single xr.Dataset object. The retrieved dataset may contain additional coords and data_vars that are not defined in the output dataset. Input data converters are applied as part of the preparation process.

Parameters
  • input_keys (List[str]) – The input keys the registered DataReaders should

  • from. (read) –

  • dataset_config (DatasetConfig) – The specification of the output dataset.

Returns

The retrieved dataset.

Return type

xr.Dataset

class tsdat.io.base.Storage[source]

Bases: tsdat.utils.ParameterizedClass, abc.ABC

Abstract base class for the tsdat Storage API. Subclasses of Storage are used in pipelines to persist data and ancillary files (e.g., plots).

Parameters
  • parameters (Any) – Configuration parameters for the Storage API. The specific

  • that are allowed will be defined by subclasses of this base class. (parameters) –

  • handler (DataHandler) – The DataHandler responsible for handling both read and

  • operations needed by the storage API. (write) –

handler :DataHandler[source]

Defines methods for reading and writing datasets from the storage area.

parameters :Any[source]

(Internal) parameters used by the storage API that can be set through configuration files, environment variables, or other means.

Class Methods

fetch_data

Fetches a dataset from the storage area where the dataset’s time span is between

save_ancillary_file

Saves an ancillary file (e.g., a plot, non-dataset metadata file, etc) to the

save_data

Saves the dataset to the storage area.

uploadable_dir

Context manager that can be used to upload many ancillary files at once. This

Method Descriptions

abstract fetch_data(self, start: datetime.datetime, end: datetime.datetime, datastream: str)xarray.Dataset[source]

Fetches a dataset from the storage area where the dataset’s time span is between the specified start and end times.

Parameters
  • start (datetime) – The start time bound.

  • end (datetime) – The end time bound.

  • datastream (str) – The name of the datastream to fetch.

Returns

The fetched dataset.

Return type

xr.Dataset

abstract save_ancillary_file(self, filepath: pathlib.Path, datastream: str)[source]

Saves an ancillary file (e.g., a plot, non-dataset metadata file, etc) to the storage area for the specified datastream.

Parameters
  • filepath (Path) – Where the file that should be saved is currently located.

  • datastream (str) – The datastream that the ancillary file is associated with.

abstract save_data(self, dataset: xarray.Dataset)[source]

Saves the dataset to the storage area.

Parameters

dataset (xr.Dataset) – The dataset to save.

uploadable_dir(self, datastream: str)Generator[pathlib.Path, None, None][source]

Context manager that can be used to upload many ancillary files at once. This method yields the path to a temporary directory whose contents will be saved to the storage area using the save_ancillary_file method upon exiting the context manager.

Parameters
  • datastream (str) – The datastream associated with any files written to the

  • directory. (uploadable) –

Yields

Generator[Path, None, None] – A temporary directory whose contents should be saved to the storage area.