tsdat.io.base
¶
Classes¶
Base class for running data conversions on retrieved raw dataset. |
|
Class that groups a DataReader subclass and a DataWriter subclass together to |
|
Base class for reading data from an input source. |
|
Base class for writing data to storage area(s). |
|
DataHandler specifically tailored to reading and writing files of a specific type. |
|
Base class for file-based DataWriters. |
|
Base class for retrieving data used as input to tsdat pipelines. |
|
Abstract base class for the tsdat Storage API. Subclasses of Storage are used in |
-
class
tsdat.io.base.
DataConverter
[source]¶ Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Base class for running data conversions on retrieved raw dataset.
Class Methods
Runs the data converter on the provided (retrieved) dataset.
Method Descriptions
-
abstract
convert
(self, dataset: xarray.Dataset, dataset_config: tsdat.config.dataset.DatasetConfig, variable_name: str, **kwargs: Any) → xarray.Dataset[source]¶ Runs the data converter on the provided (retrieved) dataset.
- Parameters
dataset (xr.Dataset) – The dataset to convert.
dataset_config (DatasetConfig) – The dataset configuration.
variable_name (str) – The name of the variable to convert.
- Returns
The converted dataset.
- Return type
xr.Dataset
-
abstract
-
class
tsdat.io.base.
DataHandler
[source]¶ Bases:
tsdat.utils.ParameterizedClass
Class that groups a DataReader subclass and a DataWriter subclass together to provide a unified approach to data I/O.
- Parameters
reader (DataReader) – The DataReader subclass responsible for handling input data
reading. –
writer (FileWriter) – The FileWriter subclass responsible for handling output
writing. (data) –
-
class
tsdat.io.base.
DataReader
[source]¶ Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Base class for reading data from an input source.
- Parameters
regex (Pattern[str]) – The regex pattern associated with the DataReader. If
the DataReader from a tsdat pipeline (calling) –
pattern will be checked (this) –
each possible input key before the read() method is called. (against) –
Class Methods
Uses the input key to open a resource and load data as a xr.Dataset object or as
Method Descriptions
-
abstract
read
(self, input_key: str) → Union[xarray.Dataset, Dict[str, xarray.Dataset]][source]¶ Uses the input key to open a resource and load data as a xr.Dataset object or as a mapping of strings to xr.Dataset objects.
In most cases DataReaders will only need to return a single xr.Dataset, but occasionally some types of inputs necessitate that the data loaded from the input_key be returned as a mapping. For example, if the input_key is a path to a zip file containing multiple disparate datasets, then returning a mapping is appropriate.
- Parameters
input_key (str) – An input key matching the DataReader’s regex pattern that
be used to load data. (should) –
- Returns
The raw data extracted from the provided input key.
- Return type
Union[xr.Dataset, Dict[str, xr.Dataset]]
-
class
tsdat.io.base.
DataWriter
[source]¶ Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Base class for writing data to storage area(s).
Class Methods
Writes the dataset to the storage area. This method is typically called by
Method Descriptions
-
abstract
write
(self, dataset: xarray.Dataset, **kwargs: Any) → None[source]¶ Writes the dataset to the storage area. This method is typically called by the tsdat storage API, which will be responsible for providing any additional parameters required by subclasses of the tsdat.io.base.DataWriter class.
- Parameters
dataset (xr.Dataset) – The dataset to save.
-
abstract
-
class
tsdat.io.base.
FileHandler
[source]¶ Bases:
DataHandler
DataHandler specifically tailored to reading and writing files of a specific type.
- Parameters
reader (DataReader) – The DataReader subclass responsible for handling input data
reading. –
writer (FileWriter) – The FileWriter subclass responsible for handling output
writing. (data) –
-
class
tsdat.io.base.
FileWriter
[source]¶ Bases:
DataWriter
,abc.ABC
Base class for file-based DataWriters.
- Parameters
file_extension (str) – The file extension that the FileHandler should be used
for –
e.g. –
".nc" –
".csv" –
.. –
Class Methods
Writes the dataset to the provided filepath. This method is typically called by
Method Descriptions
-
abstract
write
(self, dataset: xarray.Dataset, filepath: Optional[pathlib.Path] = None, **kwargs: Any) → None[source]¶ Writes the dataset to the provided filepath. This method is typically called by the tsdat storage API, which will be responsible for providing the filepath, including the file extension.
- Parameters
dataset (xr.Dataset) – The dataset to save.
filepath (Optional[Path]) – The path to the file to save.
-
class
tsdat.io.base.
Retriever
[source]¶ Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Base class for retrieving data used as input to tsdat pipelines.
- Parameters
readers (Dict[str, DataReader]) – The mapping of readers that should be used to
data given input_keys and optional keyword arguments provided by (retrieve) –
of Retriever. (subclasses) –
-
readers
:Dict[Pattern, Any][source]¶ Mapping of readers that should be used to read data given input keys.
Class Methods
Prepares the raw dataset mapping for use in downstream pipeline processes by
Method Descriptions
-
abstract
retrieve
(self, input_keys: List[str], dataset_config: tsdat.config.dataset.DatasetConfig, **kwargs: Any) → xarray.Dataset[source]¶ Prepares the raw dataset mapping for use in downstream pipeline processes by consolidating the data into a single xr.Dataset object. The retrieved dataset may contain additional coords and data_vars that are not defined in the output dataset. Input data converters are applied as part of the preparation process.
- Parameters
input_keys (List[str]) – The input keys the registered DataReaders should
from. (read) –
dataset_config (DatasetConfig) – The specification of the output dataset.
- Returns
The retrieved dataset.
- Return type
xr.Dataset
-
class
tsdat.io.base.
Storage
[source]¶ Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Abstract base class for the tsdat Storage API. Subclasses of Storage are used in pipelines to persist data and ancillary files (e.g., plots).
- Parameters
parameters (Any) – Configuration parameters for the Storage API. The specific
that are allowed will be defined by subclasses of this base class. (parameters) –
handler (DataHandler) – The DataHandler responsible for handling both read and
operations needed by the storage API. (write) –
-
handler
:DataHandler[source]¶ Defines methods for reading and writing datasets from the storage area.
-
parameters
:Any[source]¶ (Internal) parameters used by the storage API that can be set through configuration files, environment variables, or other means.
Class Methods
Fetches a dataset from the storage area where the dataset’s time span is between
Saves an ancillary file (e.g., a plot, non-dataset metadata file, etc) to the
Saves the dataset to the storage area.
Context manager that can be used to upload many ancillary files at once. This
Method Descriptions
-
abstract
fetch_data
(self, start: datetime.datetime, end: datetime.datetime, datastream: str) → xarray.Dataset[source]¶ Fetches a dataset from the storage area where the dataset’s time span is between the specified start and end times.
- Parameters
start (datetime) – The start time bound.
end (datetime) – The end time bound.
datastream (str) – The name of the datastream to fetch.
- Returns
The fetched dataset.
- Return type
xr.Dataset
-
abstract
save_ancillary_file
(self, filepath: pathlib.Path, datastream: str)[source]¶ Saves an ancillary file (e.g., a plot, non-dataset metadata file, etc) to the storage area for the specified datastream.
- Parameters
filepath (Path) – Where the file that should be saved is currently located.
datastream (str) – The datastream that the ancillary file is associated with.
-
abstract
save_data
(self, dataset: xarray.Dataset)[source]¶ Saves the dataset to the storage area.
- Parameters
dataset (xr.Dataset) – The dataset to save.
-
uploadable_dir
(self, datastream: str) → Generator[pathlib.Path, None, None][source]¶ Context manager that can be used to upload many ancillary files at once. This method yields the path to a temporary directory whose contents will be saved to the storage area using the save_ancillary_file method upon exiting the context manager.
- Parameters
datastream (str) – The datastream associated with any files written to the
directory. (uploadable) –
- Yields
Generator[Path, None, None] – A temporary directory whose contents should be saved to the storage area.