tsdat.io.storage
¶
Classes¶
Handles data storage and retrieval for file-based data formats. |
|
Handles data storage and retrieval for file-based data formats in an AWS S3 bucket. |
|
Handles data storage and retrieval for zarr archives on a local filesystem. |
- class tsdat.io.storage.FileSystem[source]¶
Bases:
tsdat.io.base.Storage
Handles data storage and retrieval for file-based data formats.
Formats that write to directories (such as zarr) are not supported by the FileSystem storage class.
- Parameters
parameters (Parameters) – File-system specific parameters, such as the root path to where files should be saved, or additional keyword arguments to specific functions used by the storage API. See the FileSystemStorage.Parameters class for more details.
handler (FileHandler) – The FileHandler class that should be used to handle data I/O within the storage API.
- class Parameters[source]¶
Bases:
pydantic.BaseSettings
- ancillary_folder :pathlib.Path[source]¶
The directory under storage_root/ where datastream ancillary folders and files should be saved to. This is primarily used for plots. Defaults to ancillary/.
- ancillary_storage_path :pathlib.Path[source]¶
The directory structure that should be used when ancillary files are saved. Allows substitution of the following parameters using curly braces ‘{}’:
storage_root
: the value from thestorage_root
parameter.data_folder
: the value from thedata_folder
parameter.ancillary_folder
: the value from theancillary_folder
parameter.datastream
: thedatastream
as defined in the dataset configuration file.location_id
: thelocation_id
as defined in the dataset configuration file.data_level
: thedata_level
as defined in the dataset configuration file.ext
: the file extension (e.g., ‘png’, ‘gif’).year
: the year of the first timestamp in the file.month
: the month of the first timestamp in the file.day
: the day of the first timestamp in the file.
Defaults to
{storage_root}/{ancillary_folder}/{datastream}
.
- data_folder :pathlib.Path[source]¶
The directory under storage_root/ where datastream data folders and files should be saved to. Defaults to data/.
- data_storage_path :pathlib.Path[source]¶
The directory structure that should be used when data files are saved. Allows substitution of the following parameters using curly braces ‘{}’:
storage_root
: the value from thestorage_root
parameter.data_folder
: the value from thedata_folder
parameter.ancillary_folder
: the value from theancillary_folder
parameter.datastream
: thedatastream
as defined in the dataset configuration file.location_id
: thelocation_id
as defined in the dataset configuration file.data_level
: thedata_level
as defined in the dataset configuration file.year
: the year of the first timestamp in the file.month
: the month of the first timestamp in the file.day
: the day of the first timestamp in the file.extension
: the file extension used by the output file writer.
Defaults to
{storage_root}/{data_folder}/{datastream}
.
- merge_fetched_data_kwargs :Dict[str, Any][source]¶
Keyword arguments passed to xr.merge.
Note that this will only be called if the DataReader returns a dictionary of xr.Datasets for a single input key.
- storage_root :pathlib.Path[source]¶
The path on disk where data and ancillary files will be saved to. Defaults to the storage/root folder in the active working directory. The directory is created as this parameter is set, if the directory does not already exist.
Class Methods
Fetches data for a given datastream between a specified time range.
Saves an ancillary filepath to the datastream's ancillary storage area.
Saves a dataset to the storage area.
Method Descriptions
- fetch_data(self, start: datetime.datetime, end: datetime.datetime, datastream: str) xarray.Dataset [source]¶
Fetches data for a given datastream between a specified time range.
Note: this method is not smart; it searches for the appropriate data files using their filenames and does not filter within each data file.
- Parameters
start (datetime) – The minimum datetime to fetch.
end (datetime) – The maximum datetime to fetch.
datastream (str) – The datastream id to search for.
- Returns
xr.Dataset – A dataset containing all the data in the storage area that spans the specified datetimes.
- class tsdat.io.storage.FileSystemS3[source]¶
Bases:
FileSystem
Handles data storage and retrieval for file-based data formats in an AWS S3 bucket.
- Parameters
parameters (Parameters) – File-system and AWS-specific parameters, such as the root path to where files should be saved, or additional keyword arguments to specific functions used by the storage API. See the FileSystemS3.Parameters class for more details.
handler (FileHandler) – The FileHandler class that should be used to handle data I/O within the storage API.
- class Parameters[source]¶
Bases:
FileSystem
Additional parameters for S3 storage.
Note that all settings and parameters from
Filesystem.Parameters
are also supported byFileSystemS3.Parameters
.- bucket :str[source]¶
The name of the S3 bucket that the storage class should use.
Note
This parameter can also be set via the
TSDAT_S3_BUCKET_NAME
environment variable.
- merge_fetched_data_kwargs :Dict[str, Any][source]¶
Keyword arguments to xr.merge. This will only be called if the DataReader returns a dictionary of xr.Datasets for a single saved file.
- region :str[source]¶
The AWS region of the storage bucket.
Note
This parameter can also be set via the
AWS_DEFAULT_REGION
environment variable.Defaults to
us-west-2
.
- storage_root :pathlib.Path[source]¶
The path on disk where data and ancillary files will be saved to.
Note
This parameter can also be set via the
TSDAT_STORAGE_ROOT
environment variable.Defaults to the
storage/root
folder in the top level of the storage bucket.
Class Methods
Saves an ancillary filepath to the datastream's ancillary storage area.
Saves a dataset to the storage area.
Method Descriptions
- class tsdat.io.storage.ZarrLocalStorage[source]¶
Bases:
FileSystem
Handles data storage and retrieval for zarr archives on a local filesystem.
Zarr is a special format that writes chunked data to a number of files underneath a given directory. This distribution of data into chunks and distinct files makes zarr an extremely well-suited format for quickly storing and retrieving large quantities of data.
- Parameters
parameters (Parameters) – File-system specific parameters, such as the root path to where the Zarr archives should be saved, or additional keyword arguments to specific functions used by the storage API. See the Parameters class for more details.
handler (ZarrHandler) – The ZarrHandler class that should be used to handle data I/O within the storage API.
- class Parameters[source]¶
Bases:
FileSystem
- data_storage_path :pathlib.Path[source]¶
The directory structure that should be used when data files are saved. Allows substitution of the following parameters using curly braces ‘{}’:
storage_root
: the value from thestorage_root
parameter.data_folder
: the value from thedata_folder
parameter.ancillary_folder
: the value from theancillary_folder
parameter.datastream
: thedatastream
as defined in the dataset configuration file.location_id
: thelocation_id
as defined in the dataset configuration file.data_level
: thedata_level
as defined in the dataset configuration file.year
: the year of the first timestamp in the file.month
: the month of the first timestamp in the file.day
: the day of the first timestamp in the file.extension
: the file extension used by the output file writer.
Defaults to
{storage_root}/{data_folder}/{datastream}.{extension}
.
Class Methods
Fetches data for a given datastream between a specified time range.
Method Descriptions
- fetch_data(self, start: datetime.datetime, end: datetime.datetime, datastream: str) xarray.Dataset [source]¶
Fetches data for a given datastream between a specified time range.
- Parameters
start (datetime) – The minimum datetime to fetch (inclusive).
end (datetime) – The maximum datetime to fetch (exclusive).
datastream (str) – The datastream id to search for.
- Returns
xr.Dataset – A dataset containing all the data in the storage area that spans the specified datetimes.