Skip to content

zip_reader

Classes:

Name Description
ZipReader

DataReader for reading from a zipped archive. Writing to this format is not

Classes#

ZipReader #

ZipReader(parameters: Dict = None)

Bases: ArchiveReader

DataReader for reading from a zipped archive. Writing to this format is not supported.

This class requires a that readers be specified in the parameters section of the storage configuration file. The structure of thereaders section should mirror the structure of its parent `readers section. To illustrate, consider the following configuration block:

readers:
  .*:
    zip:
      file_pattern: .*zip
      classname: tsdat.io.readers.ZipReader
      parameters:
        # Parameters to specify how the ZipReader should read/unpack the archive.
        # Parameters here are passed to the Python open() method as kwargs. The
        # default value is shown below.
        open_zip_kwargs:
          mode: "rb"

        # Parameters here are passed to zipfile.ZipFile.open() as kwargs. Useful
        # for specifying the system encoding or compression algorithm to use for
        # unpacking the archive. These are optional.
        read_zip_kwargs:
          mode: "r"

        # The readers section tells the ZipReaders which DataReaders should be
        # used to read the unpacked files.
        readers:
          .*csv:
            classname: tsdat.io.readers.CSVReader
            parameters:  # Parameters specific to tsdat.io.readers.CsvReader
                read_csv_kwargs:
                sep: '\t'

        # Pattern(s) used to exclude certain files in the archive from being handled.
        # This parameter is optional, and the default value is shown below:
        exclude: ['.*__MACOSX/.*', '.*DS_Store']

Classes:

Name Description
Parameters

Methods:

Name Description
read

Extracts the file into memory and uses registered DataReaders to read each relevant

Attributes:

Name Type Description
parameters Parameters
Source code in tsdat/io/base/archive_reader.py
def __init__(self, parameters: Dict = None):  # type: ignore
    super().__init__(parameters=parameters)

    # Naively merge a list of regex patterns to exclude certain files from being
    # read. By default we exclude files that macOS creates when zipping a folder.
    exclude = [".*\\_\\_MACOSX/.*", ".*\\.DS_Store"]
    exclude.extend(getattr(self.parameters, "exclude", []))
    self.parameters.exclude = "(?:% s)" % "|".join(exclude)

Attributes#

parameters class-attribute instance-attribute #
parameters: Parameters = Parameters()

Classes#

Parameters #

Bases: BaseModel

Attributes:

Name Type Description
exclude List[str]
open_zip_kwargs Dict[str, Any]
read_zip_kwargs Dict[str, Any]
readers Dict[str, Any]
Attributes#
exclude class-attribute instance-attribute #
exclude: List[str] = []
open_zip_kwargs class-attribute instance-attribute #
open_zip_kwargs: Dict[str, Any] = {}
read_zip_kwargs class-attribute instance-attribute #
read_zip_kwargs: Dict[str, Any] = {}
readers class-attribute instance-attribute #
readers: Dict[str, Any] = {}

Functions#

read #
read(input_key: str) -> Dict[str, xr.Dataset]

Extracts the file into memory and uses registered DataReaders to read each relevant extracted file into its own xarray Dataset object. Returns a mapping like {filename: xr.Dataset}.

Parameters:

Name Type Description Default
input_key Union[str, BytesIO]

The file to read in. Can be provided as a filepath or a bytes-like object. It is used to open the zip file.

required

Returns:

Type Description
Dict[str, Dataset]

Dict[str, xr.Dataset]: A mapping of {label: xr.Dataset}.

Source code in tsdat/io/readers/zip_reader.py
def read(self, input_key: str) -> Dict[str, xr.Dataset]:
    """Extracts the file into memory and uses registered `DataReaders` to read each relevant
    extracted file into its own xarray Dataset object. Returns a mapping like
    {filename: xr.Dataset}.

    Args:
        input_key (Union[str, BytesIO]): The file to read in. Can be provided as a filepath or
            a bytes-like object. It is used to open the zip file.

    Returns:
        Dict[str, xr.Dataset]: A mapping of {label: xr.Dataset}.
    """
    output: Dict[str, xr.Dataset] = {}

    # If we are reading from a string / filepath then add option to specify more
    # parameters for opening (i.e., mode or encoding options)
    fileobj = None
    if isinstance(input_key, str):  # Necessary for archiveReaders
        open_params = dict(mode="rb")
        open_params.update(self.parameters.open_zip_kwargs)
        fileobj = open(input_key, **open_params)  # type: ignore
    else:
        fileobj = input_key

    zip_file = ZipFile(file=fileobj, **self.parameters.read_zip_kwargs)  # type: ignore

    for filename in zip_file.namelist():
        if re.match(self.parameters.exclude, filename):  # type: ignore
            continue

        for key in self.parameters.readers.keys():
            if not re.match(key, filename):
                continue

            reader: Optional[DataReader] = self.parameters.readers.get(key, None)
            if reader:
                zip_bytes = BytesIO(zip_file.read(filename))
                data = reader.read(zip_bytes)  # type: ignore

                if isinstance(data, xr.Dataset):
                    data = {filename: data}
                output.update(data)

    return output