Skip to content

file_system_s3

Classes:

Name Description
FileSystemS3

Handles data storage and retrieval for file-based data in an AWS S3 bucket.

S3Object

Classes#

FileSystemS3 #

Bases: FileSystem

Handles data storage and retrieval for file-based data in an AWS S3 bucket.

Classes:

Name Description
Parameters

Additional parameters for S3 storage.

Methods:

Name Description
last_modified

Returns the datetime of the last modification to the datastream's storage

modified_since

Returns the data datetimes of all files modified after the specified time.

save_ancillary_file

Saves an ancillary filepath to the datastream's ancillary storage area.

save_data

Attributes:

Name Type Description
parameters Parameters

File-system and AWS-specific parameters, such as the path to where files should

Attributes#

parameters class-attribute instance-attribute #
parameters: Parameters = Field(default_factory=Parameters)

File-system and AWS-specific parameters, such as the path to where files should be saved or additional keyword arguments to specific functions used by the storage API. See the FileSystemS3.Parameters class for more details.

Classes#

Parameters #

Bases: Parameters

Additional parameters for S3 storage.

Note that all settings and parameters from Filesystem.Parameters are also supported by FileSystemS3.Parameters.

Attributes:

Name Type Description
bucket str

The name of the S3 bucket that the storage class should use.

region str

The AWS region of the storage bucket.

Attributes#
bucket class-attribute instance-attribute #
bucket: str = Field(
    "tsdat-storage", env="TSDAT_S3_BUCKET_NAME"
)

The name of the S3 bucket that the storage class should use.

Note

This parameter can also be set via the TSDAT_S3_BUCKET_NAME environment variable.

region class-attribute instance-attribute #
region: str = Field('us-west-2', env='AWS_DEFAULT_REGION')

The AWS region of the storage bucket.

Note

This parameter can also be set via the AWS_DEFAULT_REGION environment variable.

Defaults to us-west-2.

Functions#

last_modified #
last_modified(datastream: str) -> Union[datetime, None]

Returns the datetime of the last modification to the datastream's storage area.

Source code in tsdat/io/storage/file_system_s3.py
def last_modified(self, datastream: str) -> Union[datetime, None]:
    """Returns the datetime of the last modification to the datastream's storage
    area."""
    filepath_glob = self.data_filepath_template.substitute(
        self._get_substitutions(datastream=datastream),
        allow_missing=True,
        fill=".*",
    )
    s3_objects = self._get_matching_s3_objects(filepath_glob)

    last_modified = None
    for obj in s3_objects:
        if obj.last_modified is not None:
            mod_time = obj.last_modified.astimezone(timezone.utc)
            last_modified = (
                mod_time if last_modified is None else max(last_modified, mod_time)
            )
    return last_modified
modified_since #
modified_since(
    datastream: str, last_modified: datetime
) -> List[datetime]

Returns the data datetimes of all files modified after the specified time.

Source code in tsdat/io/storage/file_system_s3.py
def modified_since(
    self, datastream: str, last_modified: datetime
) -> List[datetime]:
    """Returns the data datetimes of all files modified after the specified time."""
    filepath_glob = self.data_filepath_template.substitute(
        self._get_substitutions(
            datastream=datastream,
        ),
        allow_missing=True,
        fill=".*",
    )
    s3_objects = self._get_matching_s3_objects(filepath_glob)
    return [
        get_file_datetime(
            Path(obj.key).name, self.parameters.data_filename_template
        )
        for obj in s3_objects
        if (
            obj.last_modified is not None
            and obj.last_modified.astimezone(timezone.utc) > last_modified
        )
    ]
save_ancillary_file #
save_ancillary_file(filepath: Path, target_path: Path)

Saves an ancillary filepath to the datastream's ancillary storage area.

NOTE: In most cases this function should not be used directly. Instead, prefer using the self.uploadable_dir(*args, **kwargs) method.

Parameters:

Name Type Description Default
filepath Path

The path to the ancillary file. This is expected to have a standardized filename and should be saved under the ancillary storage path.

required
target_path str

The path to where the data should be saved.

required
Source code in tsdat/io/storage/file_system_s3.py
def save_ancillary_file(self, filepath: Path, target_path: Path):  # type: ignore
    """Saves an ancillary filepath to the datastream's ancillary storage area.

    NOTE: In most cases this function should not be used directly. Instead, prefer
    using the ``self.uploadable_dir(*args, **kwargs)`` method.

    Args:
        filepath (Path): The path to the ancillary file. This is expected to have
            a standardized filename and should be saved under the ancillary storage
            path.
        target_path (str): The path to where the data should be saved.
    """
    self._bucket.upload_file(Filename=str(filepath), Key=target_path.as_posix())
    logger.info("Saved ancillary file to: %s", target_path.as_posix())
save_data #
save_data(dataset: xr.Dataset, **kwargs: Any)
Source code in tsdat/io/storage/file_system_s3.py
def save_data(self, dataset: xr.Dataset, **kwargs: Any):
    filepath = Path(
        self.data_filepath_template.substitute(
            self._get_substitutions(
                dataset=dataset,
                extension=self.handler.extension
                or self.handler.writer.file_extension,
            ),
            allow_missing=False,
        )
    )
    with tempfile.TemporaryDirectory() as tmp_dir:
        self.handler.writer.write(dataset, Path(tmp_dir) / filepath.name)
        for file in Path(tmp_dir).glob("**/*"):
            if file.is_file():
                key = (filepath.parent / file.relative_to(tmp_dir)).as_posix()
                self._bucket.upload_file(Filename=file.as_posix(), Key=key)
                logger.info(
                    "Saved %s data file to s3://%s/%s",
                    dataset.attrs["datastream"],
                    self.parameters.bucket,
                    key,
                )
    return None

S3Object #

Bases: Protocol

Attributes:

Name Type Description
key str
last_modified datetime

Attributes#

key instance-attribute #
key: str
last_modified instance-attribute #
last_modified: datetime

Modules#