tsdat.io.storage

Classes

FileSystem

Handles data storage and retrieval for file-based data formats.

FileSystemS3

Handles data storage and retrieval for file-based data formats in an AWS S3 bucket.

ZarrLocalStorage

Handles data storage and retrieval for zarr archives on a local filesystem.

class tsdat.io.storage.FileSystem[source]

Bases: tsdat.io.base.Storage

Handles data storage and retrieval for file-based data formats.

Formats that write to directories (such as zarr) are not supported by the FileSystem storage class.

Parameters
  • parameters (Parameters) – File-system specific parameters, such as the root path to where files should be saved, or additional keyword arguments to specific functions used by the storage API. See the FileSystemStorage.Parameters class for more details.

  • handler (FileHandler) – The FileHandler class that should be used to handle data I/O within the storage API.

class Parameters[source]

Bases: pydantic.BaseSettings

file_timespan :Optional[str][source]
merge_fetched_data_kwargs :Dict[str, Any][source]
storage_root :pathlib.Path[source]

The path on disk where data and ancillary files will be saved to. Defaults to the storage/root folder in the active working directory. The directory is created as this parameter is set, if the directory does not already exist.

handler :tsdat.io.handlers.FileHandler[source]
parameters :FileSystem.Parameters[source]

Class Methods

fetch_data

Fetches data for a given datastream between a specified time range.

save_ancillary_file

Saves an ancillary filepath to the datastream's ancillary storage area.

save_data

Saves a dataset to the storage area.

Method Descriptions

fetch_data(self, start: datetime.datetime, end: datetime.datetime, datastream: str) xarray.Dataset[source]

Fetches data for a given datastream between a specified time range.

Note: this method is not smart; it searches for the appropriate data files using their filenames and does not filter within each data file.

Parameters
  • start (datetime) – The minimum datetime to fetch.

  • end (datetime) – The maximum datetime to fetch.

  • datastream (str) – The datastream id to search for.

Returns

xr.Dataset – A dataset containing all the data in the storage area that spans the specified datetimes.

save_ancillary_file(self, filepath: pathlib.Path, datastream: str)[source]

Saves an ancillary filepath to the datastream’s ancillary storage area.

Parameters
  • filepath (Path) – The path to the ancillary file.

  • datastream (str) – The datastream that the file is related to.

save_data(self, dataset: xarray.Dataset)[source]

Saves a dataset to the storage area.

At a minimum, the dataset must have a ‘datastream’ global attribute and must have a ‘time’ variable with a np.datetime64-like data type.

Parameters

dataset (xr.Dataset) – The dataset to save.

class tsdat.io.storage.FileSystemS3[source]

Bases: FileSystem

Handles data storage and retrieval for file-based data formats in an AWS S3 bucket.

Parameters
  • parameters (Parameters) – File-system and AWS-specific parameters, such as the root path to where files should be saved, or additional keyword arguments to specific functions used by the storage API. See the S3Storage.Parameters class for more details.

  • handler (FileHandler) – The FileHandler class that should be used to handle data I/O within the storage API.

class Parameters[source]

Bases: pydantic.BaseSettings

bucket :str[source]

The name of the S3 bucket that the storage class should attach to.

merge_fetched_data_kwargs :Dict[str, Any][source]

this will only be called if the DataReader returns a dictionary of xr.Datasets for a single saved file.

Type

Keyword arguments to xr.merge. Note

region :str[source]

The AWS region of the storage bucket. Defaults to “us-west-2”.

storage_root :pathlib.Path[source]

The path on disk where data and ancillary files will be saved to. Defaults to the storage/root folder in the top level of the storage bucket.

parameters :FileSystemS3.Parameters[source]

Class Methods

bucket

check_authentication

ensure_bucket_exists

exists

get_obj

save_ancillary_file

Saves an ancillary filepath to the datastream's ancillary storage area.

save_data

Saves a dataset to the storage area.

session

Method Descriptions

property bucket(self)[source]
check_authentication(cls, parameters: Parameters)[source]
ensure_bucket_exists(cls, parameters: Parameters)[source]
exists(self, key: Union[pathlib.Path, str]) bool[source]
get_obj(self, key: Union[pathlib.Path, str])[source]
save_ancillary_file(self, filepath: pathlib.Path, datastream: str)[source]

Saves an ancillary filepath to the datastream’s ancillary storage area.

Parameters
  • filepath (Path) – The path to the ancillary file.

  • datastream (str) – The datastream that the file is related to.

save_data(self, dataset: xarray.Dataset)[source]

Saves a dataset to the storage area.

At a minimum, the dataset must have a ‘datastream’ global attribute and must have a ‘time’ variable with a np.datetime64-like data type.

Parameters

dataset (xr.Dataset) – The dataset to save.

property session(self)[source]
class tsdat.io.storage.ZarrLocalStorage[source]

Bases: tsdat.io.base.Storage

Handles data storage and retrieval for zarr archives on a local filesystem.

Zarr is a special format that writes chunked data to a number of files underneath a given directory. This distribution of data into chunks and distinct files makes zarr an extremely well-suited format for quickly storing and retrieving large quantities of data.

Parameters
  • parameters (Parameters) – File-system specific parameters, such as the root path to where the Zarr archives should be saved, or additional keyword arguments to specific functions used by the storage API. See the Parameters class for more details.

  • handler (ZarrHandler) – The ZarrHandler class that should be used to handle data I/O within the storage API.

class Parameters[source]

Bases: pydantic.BaseSettings

storage_root :pathlib.Path[source]

The path on disk where data and ancillary files will be saved to. Defaults to the storage/root folder in the active working directory. The directory is created as this parameter is set, if the directory does not already exist.

handler :tsdat.io.handlers.ZarrHandler[source]
parameters :ZarrLocalStorage.Parameters[source]

Class Methods

fetch_data

Fetches data for a given datastream between a specified time range.

save_ancillary_file

Saves an ancillary filepath to the datastream's ancillary storage area.

save_data

Saves a dataset to the storage area.

Method Descriptions

fetch_data(self, start: datetime.datetime, end: datetime.datetime, datastream: str) xarray.Dataset[source]

Fetches data for a given datastream between a specified time range.

Parameters
  • start (datetime) – The minimum datetime to fetch (inclusive).

  • end (datetime) – The maximum datetime to fetch (exclusive).

  • datastream (str) – The datastream id to search for.

Returns

xr.Dataset – A dataset containing all the data in the storage area that spans the specified datetimes.

save_ancillary_file(self, filepath: pathlib.Path, datastream: str)[source]

Saves an ancillary filepath to the datastream’s ancillary storage area.

Parameters
  • filepath (Path) – The path to the ancillary file.

  • datastream (str) – The datastream that the file is related to.

save_data(self, dataset: xarray.Dataset)[source]

Saves a dataset to the storage area.

At a minimum, the dataset must have a ‘datastream’ global attribute and must have a ‘time’ variable with a np.datetime64-like data type.

Parameters

dataset (xr.Dataset) – The dataset to save.