tsdat.io.base
¶
Classes¶
Base class for running data conversions on retrieved raw data. |
|
Groups a DataReader subclass and a DataWriter subclass together. |
|
Base class for reading data from an input source. |
|
Base class for writing data to storage area(s). |
|
DataHandler specifically tailored to reading and writing files of a specific type. |
|
Base class for file-based DataWriters. |
|
Maps variable names to the rules and conversions that should be applied. |
|
Maps variable names to the input DataArray the data are retrieved from. |
|
Base class for retrieving data used as input to tsdat pipelines. |
|
Abstract base class for the tsdat Storage API. Subclasses of Storage are used in |
- class tsdat.io.base.DataConverter[source]¶
Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Base class for running data conversions on retrieved raw data.
Class Methods
Runs the data converter on the retrieved data.
Method Descriptions
- abstract convert(self, data: xarray.DataArray, variable_name: str, dataset_config: tsdat.config.dataset.DatasetConfig, retrieved_dataset: RetrievedDataset, **kwargs: Any) Optional[xarray.DataArray] [source]¶
Runs the data converter on the retrieved data.
- Parameters
data (xr.DataArray) – The retrieved DataArray to convert.
retrieved_dataset (RetrievedDataset) – The retrieved dataset containing data to convert.
dataset_config (DatasetConfig) – The output dataset configuration.
variable_name (str) – The name of the variable to convert.
- Returns
Optional[xr.DataArray] –
- The converted DataArray for the specified variable,
or None if the conversion was done in-place.
- class tsdat.io.base.DataHandler[source]¶
Bases:
tsdat.utils.ParameterizedClass
Groups a DataReader subclass and a DataWriter subclass together.
This provides a unified approach to data I/O. DataHandlers are typically expected to be able to round-trip the data, i.e. the following psuedocode is generally true:
handler.read(handler.write(dataset))) == dataset
- Parameters
reader (DataReader) – The DataReader subclass responsible for reading input data.
writer (FileWriter) – The FileWriter subclass responsible for writing output
data. –
- class tsdat.io.base.DataReader[source]¶
Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Base class for reading data from an input source.
- Parameters
regex (Pattern[str]) – The regex pattern associated with the DataReader. If
pipeline (calling the DataReader from a tsdat) –
checked (this pattern will be) –
called. (against each possible input key before the read() method is) –
Class Methods
Reads data given an input key.
Method Descriptions
- abstract read(self, input_key: str) Union[xarray.Dataset, Dict[str, xarray.Dataset]] [source]¶
Reads data given an input key.
Uses the input key to open a resource and load data as a xr.Dataset object or as a mapping of strings to xr.Dataset objects.
In most cases DataReaders will only need to return a single xr.Dataset, but occasionally some types of inputs necessitate that the data loaded from the input_key be returned as a mapping. For example, if the input_key is a path to a zip file containing multiple disparate datasets, then returning a mapping is appropriate.
- Parameters
input_key (str) – An input key matching the DataReader’s regex pattern that should be used to load data.
- Returns
Union[xr.Dataset, Dict[str, xr.Dataset]] –
- The raw data extracted from the
provided input key.
- class tsdat.io.base.DataWriter[source]¶
Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Base class for writing data to storage area(s).
Class Methods
Writes the dataset to the storage area.
Method Descriptions
- abstract write(self, dataset: xarray.Dataset, **kwargs: Any) None [source]¶
Writes the dataset to the storage area.
This method is typically called by the tsdat storage API, which will be responsible for providing any additional parameters required by subclasses of the tsdat.io.base.DataWriter class.
- Parameters
dataset (xr.Dataset) – The dataset to save.
- class tsdat.io.base.FileHandler[source]¶
Bases:
DataHandler
DataHandler specifically tailored to reading and writing files of a specific type.
- Parameters
extension (str) – The specific file extension used for data files, e.g., “.nc”.
reader (DataReader) – The DataReader subclass responsible for reading input data.
writer (FileWriter) – The FileWriter subclass responsible for writing output
data. –
- class tsdat.io.base.FileWriter[source]¶
Bases:
DataWriter
,abc.ABC
Base class for file-based DataWriters.
- Parameters
file_extension (str) – The file extension that the FileHandler should be used for, e.g., “.nc”, “.csv”, …
Class Methods
Writes the dataset to the provided filepath.
Method Descriptions
- abstract write(self, dataset: xarray.Dataset, filepath: Optional[pathlib.Path] = None, **kwargs: Any) None [source]¶
Writes the dataset to the provided filepath.
This method is typically called by the tsdat storage API, which will be responsible for providing the filepath, including the file extension.
- Parameters
dataset (xr.Dataset) – The dataset to save.
filepath (Optional[Path]) – The path to the file to save.
- class tsdat.io.base.RetrievalRuleSelections[source]¶
Bases:
NamedTuple
Maps variable names to the rules and conversions that should be applied.
- class tsdat.io.base.RetrievedDataset[source]¶
Bases:
NamedTuple
Maps variable names to the input DataArray the data are retrieved from.
Class Methods
Method Descriptions
- class tsdat.io.base.Retriever[source]¶
Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Base class for retrieving data used as input to tsdat pipelines.
- Parameters
readers (Dict[str, DataReader]) – The mapping of readers that should be used to retrieve data given input_keys and optional keyword arguments provided by subclasses of Retriever.
- coords :Dict[str, Dict[Pattern, RetrievedVariable]][source]¶
A dictionary mapping output coordinate names to the retrieval rules and preprocessing actions (e.g., DataConverters) that should be applied to each retrieved coordinate variable.
- data_vars :Dict[str, Dict[Pattern, RetrievedVariable]][source]¶
A dictionary mapping output data variable names to the retrieval rules and preprocessing actions (e.g., DataConverters) that should be applied to each retrieved data variable.
- readers :Optional[Dict[Pattern, Any]][source]¶
Mapping of readers that should be used to read data given input keys.
Class Methods
Prepares the raw dataset mapping for use in downstream pipeline processes.
Method Descriptions
- abstract retrieve(self, input_keys: List[str], dataset_config: tsdat.config.dataset.DatasetConfig, **kwargs: Any) xarray.Dataset [source]¶
Prepares the raw dataset mapping for use in downstream pipeline processes.
This is done by consolidating the data into a single xr.Dataset object. The retrieved dataset may contain additional coords and data_vars that are not defined in the output dataset. Input data converters are applied as part of the preparation process.
- Parameters
input_keys (List[str]) – The input keys the registered DataReaders should read from.
dataset_config (DatasetConfig) – The specification of the output dataset.
- Returns
xr.Dataset – The retrieved dataset.
- class tsdat.io.base.Storage[source]¶
Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Abstract base class for the tsdat Storage API. Subclasses of Storage are used in pipelines to persist data and ancillary files (e.g., plots).
- Parameters
parameters (Any) – Configuration parameters for the Storage API. The specific parameters that are allowed will be defined by subclasses of this base class.
handler (DataHandler) – The DataHandler responsible for handling both read and write operations needed by the storage API.
- handler :DataHandler[source]¶
Defines methods for reading and writing datasets from the storage area.
- parameters :Optional[Any][source]¶
(Internal) parameters used by the storage API that can be set through configuration files, environment variables, or other means.
Class Methods
Fetches a dataset from the storage area.
Saves an ancillary file to the storage area for the specified datastream.
Saves the dataset to the storage area.
Context manager that can be used to upload many ancillary files at once.
Method Descriptions
- abstract fetch_data(self, start: datetime.datetime, end: datetime.datetime, datastream: str) xarray.Dataset [source]¶
Fetches a dataset from the storage area.
The timespan of the returned dataset is between the specified start and end times.
- Parameters
start (datetime) – The start time bound.
end (datetime) – The end time bound.
datastream (str) – The name of the datastream to fetch.
- Returns
xr.Dataset – The fetched dataset.
- abstract save_ancillary_file(self, filepath: pathlib.Path, datastream: str)[source]¶
Saves an ancillary file to the storage area for the specified datastream.
Ancillary files are plots or other non-dataset metadata files.
- Parameters
filepath (Path) – Where the file that should be saved is currently located.
datastream (str) – The datastream that the ancillary file is associated with.
- abstract save_data(self, dataset: xarray.Dataset)[source]¶
Saves the dataset to the storage area.
- Parameters
dataset (xr.Dataset) – The dataset to save.
- uploadable_dir(self, datastream: str) Generator[pathlib.Path, None, None] [source]¶
Context manager that can be used to upload many ancillary files at once.
This method yields the path to a temporary directory whose contents will be saved to the storage area using the save_ancillary_file method upon exiting the context manager.
- Parameters
datastream (str) – The datastream associated with any files written to the uploadable directory.
- Yields
Generator[Path, None, None] –
- A temporary directory whose contents should be
saved to the storage area.