tsdat.io.retrievers

Classes

DefaultRetriever

Default API for retrieving data from one or more input sources.

StorageRetriever

Retriever API for pulling input data from the storage area.

class tsdat.io.retrievers.DefaultRetriever[source]

Bases: tsdat.io.base.Retriever

Default API for retrieving data from one or more input sources.

Reads data from one or more inputs, renames coordinates and data variables according to retrieval and dataset configurations, and applies registered DataConverters to retrieved data.

Parameters
  • readers (Dict[Pattern[str], DataReader]) – A mapping of patterns to DataReaders that the retriever uses to determine which DataReader to use for reading any given input key.

  • coords (Dict[str, Dict[Pattern[str], VariableRetriever]]) – A dictionary mapping output coordinate variable names to rules for how they should be retrieved.

  • data_vars (Dict[str, Dict[Pattern[str], VariableRetriever]]) – A dictionary mapping output data variable names to rules for how they should be retrieved.

class Parameters[source]

Bases: pydantic.BaseModel

merge_kwargs :Dict[str, Any][source]

Keyword arguments passed to xr.merge(). This is only relevant if multiple input keys are provided simultaneously, or if any registered DataReader objects could return a dataset mapping instead of a single dataset.

parameters :DefaultRetriever.Parameters[source]
readers :Dict[Pattern, tsdat.io.base.DataReader][source]

A dictionary of DataReaders that should be used to read data provided an input key.

Class Methods

retrieve

Prepares the raw dataset mapping for use in downstream pipeline processes.

Method Descriptions

retrieve(self, input_keys: List[str], dataset_config: tsdat.config.dataset.DatasetConfig, **kwargs: Any) xarray.Dataset[source]

Prepares the raw dataset mapping for use in downstream pipeline processes.

This is done by consolidating the data into a single xr.Dataset object. The retrieved dataset may contain additional coords and data_vars that are not defined in the output dataset. Input data converters are applied as part of the preparation process.

Parameters
  • input_keys (List[str]) – The input keys the registered DataReaders should read from.

  • dataset_config (DatasetConfig) – The specification of the output dataset.

Returns

xr.Dataset – The retrieved dataset.

class tsdat.io.retrievers.StorageRetriever[source]

Bases: tsdat.io.base.Retriever

Retriever API for pulling input data from the storage area.

Class Methods

retrieve

Retrieves input data from the storage area.

Method Descriptions

retrieve(self, input_keys: List[str], dataset_config: tsdat.config.dataset.DatasetConfig, storage: Optional[tsdat.io.base.Storage] = None, **kwargs: Any) xarray.Dataset[source]

Retrieves input data from the storage area.

Note that each input_key is expected to be formatted according to the following format:

“datastream::start-date::end-date”,

e.g., “sgp.myingest.b1::20220913.000000::20220914.000000”

This format allows the retriever to pull datastream data from the Storage API for the desired dates for each desired input source.

Parameters
  • input_keys (List[str]) – A list of specially-formatted input keys.

  • dataset_config (DatasetConfig) – The output dataset configuration.

  • storage (Storage) – Instance of a Storage class used to fetch saved data.

Returns

xr.Dataset – The retrieved dataset