file_system

Classes:

Name	Description
`FileSystem`	Handles data storage and retrieval for file-based data formats.

Classes#

FileSystem #

Bases: Storage

Handles data storage and retrieval for file-based data formats.

Formats that write to directories (such as zarr) are not supported by the FileSystem storage class.

Classes:

Name	Description
`Parameters`

Methods:

Name	Description
`fetch_data`
`last_modified`	Find the last modified time for any data in that datastream.
`modified_since`	Find the list of data dates that have been modified since the passed
`save_ancillary_file`	Saves an ancillary filepath to the datastream's ancillary storage area.
`save_data`

Attributes:

Name	Type	Description
`data_filepath_template`	`Template`
`handler`	`FileHandler`	The FileHandler class that should be used to handle data I/O within the storage
`parameters`	`Parameters`	File-system specific parameters, such as the root path to where files should be

Attributes#

data_filepath_template `property` #

data_filepath_template: Template

handler `class-attribute` `instance-attribute` #

handler: FileHandler = Field(default_factory=NetCDFHandler)

The FileHandler class that should be used to handle data I/O within the storage API.

parameters `class-attribute` `instance-attribute` #

parameters: Parameters = Field(
    default_factory=Parameters, help="Some help text?"
)

File-system specific parameters, such as the root path to where files should be saved, or additional keyword arguments to specific functions used by the storage API. See the FileSystemStorage.Parameters class for more details.

Classes#

Parameters #

Bases: Parameters

Attributes:

Name	Type	Description
`data_filename_template`	`str`	Template string to use for data filenames.
`data_storage_path`	`Path`	The directory structure under storage_root where ancillary files are saved.

Attributes#

data_filename_template class-attribute instance-attribute #

data_filename_template: str = (
    "{datastream}.{yyyy}{mm}{dd}.{HH}{MM}{SS}.{extension}"
)

Template string to use for data filenames.

Allows substitution of the following parameters using curly braces '{}':

ext: the file extension from the storage data handler
datastream from the dataset's global attributes
location_id from the dataset's global attributes
data_level from the dataset's global attributes
date_time: the first timestamp in the file formatted as "YYYYMMDD.hhmmss"
Any other global attribute that has a string or integer data type.

At a minimum the template must include {date_time}.

data_storage_path class-attribute instance-attribute #

data_storage_path: Path = Path(
    "data/{location_id}/{datastream}"
)

The directory structure under storage_root where ancillary files are saved.

Allows substitution of the following parameters using curly braces '{}':

storage_root: the value from the storage_root parameter.
datastream: the datastream as defined in the dataset config file.
location_id: the location_id as defined in the dataset config file.
data_level: the data_level as defined in the dataset config file.
year: the year of the first timestamp in the file.
month: the month of the first timestamp in the file.
day: the day of the first timestamp in the file.
extension: the file extension used by the output file writer.

Defaults to data/{location_id}/{datastream}.

Functions#

fetch_data #

fetch_data(
    start: datetime,
    end: datetime,
    datastream: str,
    metadata_kwargs: Union[Dict[str, str], None] = None,
    **kwargs: Any
) -> xr.Dataset

Fetches data for a given datastream between a specified time range.

Parameters:

Name	Type	Description	Default
`start`	`datetime`	The minimum datetime to fetch.	required
`end`	`datetime`	The maximum datetime to fetch.	required
`datastream`	`str`	The datastream id to search for.	required
`metadata_kwargs`	`dict[str, str]`	Metadata substitutions to help resolve the data storage path. This is only required if the template data storage path includes any properties other than datastream or fields contained in the datastream. Defaults to None.	`None`

Returns:

Type	Description
`Dataset`	xr.Dataset: A dataset containing all the data in the storage area that spans
`Dataset`	the specified datetimes.

Source code in tsdat/io/storage/file_system.py

def fetch_data(
    self,
    start: datetime,
    end: datetime,
    datastream: str,
    metadata_kwargs: Union[Dict[str, str], None] = None,
    **kwargs: Any,
) -> xr.Dataset:
    """-----------------------------------------------------------------------------
    Fetches data for a given datastream between a specified time range.

    Args:
        start (datetime): The minimum datetime to fetch.
        end (datetime): The maximum datetime to fetch.
        datastream (str): The datastream id to search for.
        metadata_kwargs (dict[str, str], optional): Metadata substitutions to help
            resolve the data storage path. This is only required if the template
            data storage path includes any properties other than datastream or
            fields contained in the datastream. Defaults to None.

    Returns:
        xr.Dataset: A dataset containing all the data in the storage area that spans
        the specified datetimes.

    -----------------------------------------------------------------------------"""
    data_files = self._find_data(
        start, end, datastream, metadata_kwargs=metadata_kwargs
    )
    datasets = self._open_data_files(*sorted(data_files))
    dataset = xr.Dataset()
    if len(datasets) == 0:
        logger.warning(
            "No data found for %s in range %s - %s", datastream, start, end
        )
    elif len(datasets) == 1:
        dataset = datasets[0].sel(time=slice(start, end))
    else:
        dataset = xr.concat(
            datasets,
            dim="time",
            data_vars="all",
            coords="different",
            compat="equals",
        )
        dataset = dataset.sel(time=slice(start, end))
    return dataset

last_modified #

last_modified(datastream: str) -> Union[datetime, None]

Find the last modified time for any data in that datastream.

Parameters:

Name	Type	Description	Default
`datastream`	`str`	The datastream.	required

Returns:

Name	Type	Description
`datetime`	`Union[datetime, None]`	The datetime of the last modification.

Source code in tsdat/io/storage/file_system.py

def last_modified(self, datastream: str) -> Union[datetime, None]:
    """Find the last modified time for any data in that datastream.

    Args:
        datastream (str): The datastream.

    Returns:
        datetime: The datetime of the last modification.
    """
    filepath_glob = self.data_filepath_template.substitute(
        self._get_substitutions(datastream=datastream),
        allow_missing=True,
        fill="*",
    )
    filepath_glob = re.sub(r"\*+", "*", filepath_glob)
    matches = self._get_matching_files(filepath_glob)
    last_modified = None
    for file in matches:
        mod_timestamp = file.lstat().st_mtime
        mod_time = datetime.fromtimestamp(mod_timestamp).astimezone(timezone.utc)
        last_modified = (
            mod_time if last_modified is None else max(last_modified, mod_time)
        )
    return last_modified

modified_since #

modified_since(
    datastream: str, last_modified: datetime
) -> List[datetime]

Find the list of data dates that have been modified since the passed last modified date.

Parameters:

Name	Type	Description	Default
`datastream`	`str`	description	required
`last_modified`	`datetime`	Should be equivalent to run date (the last time data were changed)	required

Returns:

Type	Description
`List[datetime]`	List[datetime]: The data dates of files that were changed since the last modified date

Source code in tsdat/io/storage/file_system.py

def modified_since(
    self, datastream: str, last_modified: datetime
) -> List[datetime]:
    """Find the list of data dates that have been modified since the passed
    last modified date.

    Args:
        datastream (str): _description_
        last_modified (datetime): Should be equivalent to run date (the last time
            data were changed)

    Returns:
        List[datetime]: The data dates of files that were changed since the last
            modified date
    """
    filepath_glob = self.data_filepath_template.substitute(
        self._get_substitutions(datastream=datastream),
        allow_missing=True,
        fill="*",
    )
    filepath_glob = re.sub(r"\*+", "*", filepath_glob)
    matches = self._get_matching_files(filepath_glob)
    results: list[datetime] = []
    for file in matches:
        mod_timestamp = file.lstat().st_mtime
        mod_time = datetime.fromtimestamp(mod_timestamp).astimezone(timezone.utc)
        if mod_time > last_modified:
            data_timestamp = get_file_datetime(
                file.name, self.parameters.data_filename_template
            )
            results.append(data_timestamp)
    return results

save_ancillary_file #

save_ancillary_file(
    filepath: Path, target_path: Union[Path, None] = None
)

Saves an ancillary filepath to the datastream's ancillary storage area.

NOTE: In most cases this function should not be used directly. Instead, prefer using the self.uploadable_dir(*args, **kwargs) method.

Parameters:

Name	Type	Description	Default
`filepath`	`Path`	The path to the ancillary file. This is expected to have a standardized filename and should be saved under the ancillary storage path.	required
`target_path`	`str`	The path to where the data should be saved.	`None`

Source code in tsdat/io/storage/file_system.py

def save_ancillary_file(
    self, filepath: Path, target_path: Union[Path, None] = None
):
    """Saves an ancillary filepath to the datastream's ancillary storage area.

    NOTE: In most cases this function should not be used directly. Instead, prefer
    using the ``self.uploadable_dir(*args, **kwargs)`` method.

    Args:
        filepath (Path): The path to the ancillary file. This is expected to have
            a standardized filename and should be saved under the ancillary storage
            path.
        target_path (str): The path to where the data should be saved.
    """
    target_path.parent.mkdir(exist_ok=True, parents=True)
    saved_filepath = shutil.copy2(filepath, target_path)
    logger.info("Saved ancillary file to: %s", saved_filepath)

save_data #

save_data(dataset: Dataset, **kwargs: Any)

Saves a dataset to the storage area.

At a minimum, the dataset must have a 'datastream' global attribute and must have a 'time' variable with a np.datetime64-like data type.

Parameters:

Name	Type	Description	Default
`dataset`	`Dataset`	The dataset to save.	required

Source code in tsdat/io/storage/file_system.py

def save_data(self, dataset: xr.Dataset, **kwargs: Any):
    """-----------------------------------------------------------------------------
    Saves a dataset to the storage area.

    At a minimum, the dataset must have a 'datastream' global attribute and must
    have a 'time' variable with a np.datetime64-like data type.

    Args:
        dataset (xr.Dataset): The dataset to save.

    -----------------------------------------------------------------------------"""
    datastream = dataset.attrs["datastream"]
    substitutions = self._get_substitutions(datastream=datastream, dataset=dataset)
    filepath = Path(
        self.data_filepath_template.substitute(substitutions, allow_missing=False)
    )
    filepath.parent.mkdir(exist_ok=True, parents=True)
    self.handler.writer.write(dataset, filepath)
    logger.info("Saved %s dataset to %s", datastream, filepath.as_posix())

file_system

Classes#

FileSystem #

Attributes#

data_filepath_template property #

handler class-attribute instance-attribute #

parameters class-attribute instance-attribute #

Classes#

Parameters #

Attributes#

Functions#

fetch_data #

last_modified #

modified_since #

save_ancillary_file #

save_data #

Modules#

data_filepath_template `property` #

handler `class-attribute` `instance-attribute` #

parameters `class-attribute` `instance-attribute` #