tsdat.io.retrievers

Classes

DefaultRetriever

Default API for retrieving data from one or more input sources.

StorageRetriever

Retriever API for pulling input data from the storage area.

StorageRetrieverInput

Returns an object representation of an input storage key.

class tsdat.io.retrievers.DefaultRetriever[source]

Bases: tsdat.io.base.Retriever

Default API for retrieving data from one or more input sources.

Reads data from one or more inputs, renames coordinates and data variables according to retrieval and dataset configurations, and applies registered DataConverters to retrieved data.

Parameters:
  • readers (Dict[Pattern[str], DataReader]) – A mapping of patterns to DataReaders that the retriever uses to determine which DataReader to use for reading any given input key.

  • coords (Dict[str, Dict[Pattern[str], VariableRetriever]]) – A dictionary mapping output coordinate variable names to rules for how they should be retrieved.

  • data_vars (Dict[str, Dict[Pattern[str], VariableRetriever]]) – A dictionary mapping output data variable names to rules for how they should be retrieved.

class Parameters[source]

Bases: pydantic.BaseModel

merge_kwargs: Dict[str, Any][source]

Keyword arguments passed to xr.merge(). This is only relevant if multiple input keys are provided simultaneously, or if any registered DataReader objects could return a dataset mapping instead of a single dataset.

parameters: DefaultRetriever.Parameters[source]
readers: Dict[Pattern, tsdat.io.base.DataReader][source]

A dictionary of DataReaders that should be used to read data provided an input key.

Class Methods

retrieve

Prepares the raw dataset mapping for use in downstream pipeline processes.

Method Descriptions

retrieve(input_keys: List[str], dataset_config: tsdat.config.dataset.DatasetConfig, **kwargs: Any) xarray.Dataset[source]

Prepares the raw dataset mapping for use in downstream pipeline processes.

This is done by consolidating the data into a single xr.Dataset object. The retrieved dataset may contain additional coords and data_vars that are not defined in the output dataset. Input data converters are applied as part of the preparation process.

Parameters:
  • input_keys (List[str]) – The input keys the registered DataReaders should read from.

  • dataset_config (DatasetConfig) – The specification of the output dataset.

Returns:

xr.Dataset – The retrieved dataset.

class tsdat.io.retrievers.StorageRetriever[source]

Bases: tsdat.io.base.Retriever

Retriever API for pulling input data from the storage area.

class TransParameters[source]

Bases: pydantic.BaseModel

trans_params: GlobalARMTransformParams | None[source]
parameters: StorageRetriever.TransParameters | None[source]

Class Methods

retrieve

Retrieves input data from the storage area.

Method Descriptions

retrieve(input_keys: List[str], dataset_config: tsdat.config.dataset.DatasetConfig, storage: tsdat.io.base.Storage | None = None, input_data_hook: Callable[[Dict[str, xarray.Dataset]], Dict[str, xarray.Dataset]] | None = None, **kwargs: Any) xarray.Dataset[source]

Retrieves input data from the storage area.

Note that each input_key is expected to be formatted according to the following format:

`python "--key1 value1 --key2 value2", `

e.g.,

`python "--datastream sgp.met.b0 --start 20230801 --end 20230901" "--datastream sgp.met.b0 --start 20230801 --end 20230901 --location_id sgp --data_level b0" `

This format allows the retriever to pull datastream data from the Storage API for the desired dates for each desired input source.

Parameters:
  • input_keys (List[str]) – A list of input keys formatted as described above.

  • dataset_config (DatasetConfig) – The output dataset configuration.

  • storage (Storage) – Instance of a Storage class used to fetch saved data.

Returns:

xr.Dataset – The retrieved dataset

class tsdat.io.retrievers.StorageRetrieverInput(input_key: str)[source]

Returns an object representation of an input storage key.

Input storage keys should be formatted like:

`python "--datastream sgp.met.b0 --start 20230801 --end 20230901" "--datastream sgp.met.b0 --start 20230801 --end 20230901 --location_id sgp --data_level b0" `

Class Methods

__repr__

Return repr(self).

Method Descriptions

__repr__() str[source]

Return repr(self).