readers

Classes#

CSVReader #

Bases: DataReader

Uses pandas and xarray functions to read a csv file and extract its contents into an xarray Dataset object. Two parameters are supported: read_csv_kwargs and from_dataframe_kwargs, whose contents are passed as keyword arguments to pandas.read_csv() and xarray.Dataset.from_dataframe() respectively.

Attributes#

parameters `class-attribute` `instance-attribute` #

parameters: Parameters = Parameters()

Classes#

Parameters #

Bases: BaseModel

Attributes#

from_dataframe_kwargs class-attribute instance-attribute #

from_dataframe_kwargs: Dict[str, Any] = {}

read_csv_kwargs class-attribute instance-attribute #

read_csv_kwargs: Dict[str, Any] = {}

Functions#

read #

read(input_key: str) -> xr.Dataset

Source code in tsdat/io/readers.py

def read(self, input_key: str) -> xr.Dataset:
    df: pd.DataFrame = pd.read_csv(input_key, **self.parameters.read_csv_kwargs)  # type: ignore
    return xr.Dataset.from_dataframe(df, **self.parameters.from_dataframe_kwargs)

NetCDFReader #

Bases: DataReader

Thin wrapper around xarray's open_dataset() function, with optional parameters used as keyword arguments in the function call.

Attributes#

parameters `class-attribute` `instance-attribute` #

parameters: Dict[str, Any] = {}

Functions#

read #

read(input_key: str) -> xr.Dataset

Source code in tsdat/io/readers.py

def read(self, input_key: str) -> xr.Dataset:
    return xr.open_dataset(input_key, **self.parameters)  # type: ignore

ParquetReader #

Bases: DataReader

Uses pandas and xarray functions to read a parquet file and extract its contents into an xarray Dataset object. Two parameters are supported: read_parquet_kwargs and from_dataframe_kwargs, whose contents are passed as keyword arguments to pandas.read_parquet() and xarray.Dataset.from_dataframe() respectively.

Attributes#

parameters `class-attribute` `instance-attribute` #

parameters: Parameters = Parameters()

Classes#

Parameters #

Bases: BaseModel

Attributes#

from_dataframe_kwargs class-attribute instance-attribute #

from_dataframe_kwargs: Dict[str, Any] = {}

read_parquet_kwargs class-attribute instance-attribute #

read_parquet_kwargs: Dict[str, Any] = {}

Functions#

read #

read(input_key: str) -> xr.Dataset

Source code in tsdat/io/readers.py

def read(self, input_key: str) -> xr.Dataset:
    df: pd.DataFrame = pd.read_parquet(input_key, **self.parameters.read_parquet_kwargs)  # type: ignore
    return xr.Dataset.from_dataframe(df, **self.parameters.from_dataframe_kwargs)

TarReader #

TarReader(parameters: Dict = None)

Bases: ArchiveReader

DataReader for reading from a tarred archive. Writing to this format is not supported.

This class requires a that readers be specified in the parameters section of the storage configuration file. The structure of thereaders section should mirror the structure of its parent `readers section. To illustrate, consider the following configuration block:

readers:
  .*:
    tar:
      file_pattern: .*\.tar
      classname: tsdat.io.readers.TarReader
      parameters:
        # Parameters to specify how the TarReader should read/unpack the archive.
        # Parameters here are passed to the Python open() method as kwargs. The
        # default value is shown below.
        open_tar_kwargs:
          mode: "rb"

        # Parameters here are passed to tarfile.open() as kwargs. Useful for
        # specifying the system encoding or compression algorithm to use for
        # unpacking the archive. These are optional.
        read_tar_kwargs:
          mode: "r:gz"


        # The readers section tells the TarReader which DataReaders should be
        # used to handle the unpacked files.
        readers:
          .*\.csv:
            classname: tsdat.io.readers.CSVReader
            parameters:  # Parameters specific to tsdat.io.readers.CSVReader
              read_csv_kwargs:
                sep: '\t'

        # Pattern(s) used to exclude certain files in the archive from being handled.
        # This parameter is optional, and the default value is shown below:
        exclude: ['.*\_\_MACOSX/.*', '.*\.DS_Store']

Source code in tsdat/io/base.py

def __init__(self, parameters: Dict = None):  # type: ignore
    super().__init__(parameters=parameters)

    # Naively merge a list of regex patterns to exclude certain files from being
    # read. By default we exclude files that macOS creates when zipping a folder.
    exclude = [".*\\_\\_MACOSX/.*", ".*\\.DS_Store"]
    exclude.extend(getattr(self.parameters, "exclude", []))
    self.parameters.exclude = "(?:% s)" % "|".join(exclude)

Attributes#

parameters `class-attribute` `instance-attribute` #

parameters: Parameters = Parameters()

Classes#

Parameters #

Bases: BaseModel

Attributes#

exclude class-attribute instance-attribute #

exclude: List[str] = []

open_tar_kwargs class-attribute instance-attribute #

open_tar_kwargs: Dict[str, Any] = {}

read_tar_kwargs class-attribute instance-attribute #

read_tar_kwargs: Dict[str, Any] = {}

readers class-attribute instance-attribute #

readers: Dict[str, Any] = {}

Functions#

read #

read(input_key: str) -> Dict[str, xr.Dataset]

Extracts the file into memory and uses registered DataReaders to read each relevant extracted file into its own xarray Dataset object. Returns a mapping like {filename: xr.Dataset}.

Parameters:

Name	Type	Description	Default
`file`	`Union[str, BytesIO]`	The file to read in. Can be provided as a filepath or	required
`name`	`str`	A label used to help trace the origin of the data read-in.	required

Returns:

Type	Description
`Dict[str, Dataset]`	Dict[str, xr.Dataset]: A mapping of {label: xr.Dataset}.

Source code in tsdat/io/readers.py

def read(self, input_key: str) -> Dict[str, xr.Dataset]:
    """------------------------------------------------------------------------------------
    Extracts the file into memory and uses registered `DataReaders` to read each relevant
    extracted file into its own xarray Dataset object. Returns a mapping like
    {filename: xr.Dataset}.

    Args:
        file (Union[str, BytesIO]): The file to read in. Can be provided as a filepath or
        a bytes-like object. It is used to open the tar file.
        name (str, optional): A label used to help trace the origin of the data read-in.
        It is used in the key in the returned dictionary. Must be provided if the `file`
        argument is not string-like. If `file` is a string and `name` is not specified then
        the label will be set by `file`. Defaults to None.

    Returns:
        Dict[str, xr.Dataset]: A mapping of {label: xr.Dataset}.

    ------------------------------------------------------------------------------------
    """

    output: Dict[str, xr.Dataset] = {}

    # If we are reading from a string / filepath then add option to specify more
    # parameters for opening (i.e., mode or encoding options)
    if isinstance(input_key, str):  # Necessary for archiveReaders
        open_params = dict(mode="rb")
        open_params.update(self.parameters.open_tar_kwargs)
        fileobj = open(input_key, **open_params)  # type: ignore
    else:
        fileobj = input_key

    tar = tarfile.open(fileobj=fileobj, **self.parameters.read_tar_kwargs)  # type: ignore

    for info_obj in tar:  # type: ignore
        filename = info_obj.name  # type: ignore
        if re.match(self.parameters.exclude, filename):  # type: ignore
            continue

        for key in self.parameters.readers.keys():
            reader: DataReader = self.parameters.readers.get(key, None)
            if reader:
                tar_bytes = BytesIO(tar.extractfile(filename).read())  # type: ignore
                data = reader.read(tar_bytes)  # type: ignore

                if isinstance(data, xr.Dataset):
                    data = {filename: data}  # type: ignore
                output.update(data)  # type: ignore

    return output

ZarrReader #

Bases: DataReader

Uses xarray's Zarr capabilities to read a Zarr archive and extract its contents into an xarray Dataset object.

Attributes#

parameters `class-attribute` `instance-attribute` #

parameters: Parameters = Parameters()

Classes#

Parameters #

Bases: BaseModel

Attributes#

open_zarr_kwargs class-attribute instance-attribute #

open_zarr_kwargs: Dict[str, Any] = {}

Functions#

read #

read(input_key: str) -> xr.Dataset

Source code in tsdat/io/readers.py

def read(self, input_key: str) -> xr.Dataset:
    return xr.open_zarr(input_key, **self.parameters.open_zarr_kwargs)  # type: ignore

ZipReader #

ZipReader(parameters: Dict = None)

Bases: ArchiveReader

DataReader for reading from a zipped archive. Writing to this format is not supported.

This class requires a that readers be specified in the parameters section of the storage configuration file. The structure of thereaders section should mirror the structure of its parent `readers section. To illustrate, consider the following configuration block:

readers:
  .*:
    zip:
      file_pattern: .*\.zip
      classname: tsdat.io.readers.ZipReader
      parameters:
        # Parameters to specify how the ZipReader should read/unpack the archive.
        # Parameters here are passed to the Python open() method as kwargs. The
        # default value is shown below.
        open_zip_kwargs:
          mode: "rb"

        # Parameters here are passed to zipfile.ZipFile.open() as kwargs. Useful
        # for specifying the system encoding or compression algorithm to use for
        # unpacking the archive. These are optional.
        read_zip_kwargs:
          mode: "r"

        # The readers section tells the ZipReaders which DataReaders should be
        # used to read the unpacked files.
        readers:
          .*\.csv:
            classname: tsdat.io.readers.CSVReader
            parameters:  # Parameters specific to tsdat.io.readers.CsvReader
                read_csv_kwargs:
                sep: '\t'

        # Pattern(s) used to exclude certain files in the archive from being handled.
        # This parameter is optional, and the default value is shown below:
        exclude: ['.*\_\_MACOSX/.*', '.*\.DS_Store']

Source code in tsdat/io/base.py

def __init__(self, parameters: Dict = None):  # type: ignore
    super().__init__(parameters=parameters)

    # Naively merge a list of regex patterns to exclude certain files from being
    # read. By default we exclude files that macOS creates when zipping a folder.
    exclude = [".*\\_\\_MACOSX/.*", ".*\\.DS_Store"]
    exclude.extend(getattr(self.parameters, "exclude", []))
    self.parameters.exclude = "(?:% s)" % "|".join(exclude)

Attributes#

parameters `class-attribute` `instance-attribute` #

parameters: Parameters = Parameters()

Classes#

Parameters #

Bases: BaseModel

Attributes#

exclude class-attribute instance-attribute #

exclude: List[str] = []

open_zip_kwargs class-attribute instance-attribute #

open_zip_kwargs: Dict[str, Any] = {}

read_zip_kwargs class-attribute instance-attribute #

read_zip_kwargs: Dict[str, Any] = {}

readers class-attribute instance-attribute #

readers: Dict[str, Any] = {}

Functions#

read #

read(input_key: str) -> Dict[str, xr.Dataset]

Extracts the file into memory and uses registered DataReaders to read each relevant extracted file into its own xarray Dataset object. Returns a mapping like {filename: xr.Dataset}.

Parameters:

Name	Type	Description	Default
`input_key`	`Union[str, BytesIO]`	The file to read in. Can be provided as a filepath or	required
`name`	`str`	A label used to help trace the origin of the data read-in.	required

Returns:

Type	Description
`Dict[str, Dataset]`	Dict[str, xr.Dataset]: A mapping of {label: xr.Dataset}.

Source code in tsdat/io/readers.py

def read(self, input_key: str) -> Dict[str, xr.Dataset]:
    """------------------------------------------------------------------------------------
    Extracts the file into memory and uses registered `DataReaders` to read each relevant
    extracted file into its own xarray Dataset object. Returns a mapping like
    {filename: xr.Dataset}.

    Args:
        input_key (Union[str, BytesIO]): The file to read in. Can be provided as a filepath or
        a bytes-like object. It is used to open the zip file.
        name (str, optional): A label used to help trace the origin of the data read-in.
        It is used in the key in the returned dictionary. Must be provided if the `file`
        argument is not string-like. If `file` is a string and `name` is not specified then
        the label will be set by `file`. Defaults to None.

    Returns:
        Dict[str, xr.Dataset]: A mapping of {label: xr.Dataset}.

    ------------------------------------------------------------------------------------
    """
    output: Dict[str, xr.Dataset] = {}

    # If we are reading from a string / filepath then add option to specify more
    # parameters for opening (i.e., mode or encoding options)
    fileobj = None
    if isinstance(input_key, str):  # Necessary for archiveReaders
        open_params = dict(mode="rb")
        open_params.update(self.parameters.open_zip_kwargs)
        fileobj = open(input_key, **open_params)  # type: ignore
    else:
        fileobj = input_key

    zip = ZipFile(file=fileobj, **self.parameters.read_zip_kwargs)  # type: ignore

    for filename in zip.namelist():
        if re.match(self.parameters.exclude, filename):  # type: ignore
            continue

        for key in self.parameters.readers.keys():
            if not re.match(key, filename):
                continue
            reader: DataReader = self.parameters.readers.get(key, None)
            if reader:
                zip_bytes = BytesIO(zip.read(filename))
                data = reader.read(zip_bytes)  # type: ignore

                if isinstance(data, xr.Dataset):
                    data = {filename: data}
                output.update(data)

    return output

readers

Classes#

CSVReader #

Attributes#

parameters class-attribute instance-attribute #

Classes#

Parameters #

Attributes#

Functions#

read #

NetCDFReader #

Attributes#

parameters class-attribute instance-attribute #

Functions#

read #

ParquetReader #

Attributes#

parameters class-attribute instance-attribute #

Classes#

Parameters #

Attributes#

Functions#

read #

TarReader #

Attributes#

parameters class-attribute instance-attribute #

Classes#

Parameters #

Attributes#

Functions#

read #

ZarrReader #

Attributes#

parameters class-attribute instance-attribute #

Classes#

Parameters #

Attributes#

Functions#

read #

ZipReader #

Attributes#

parameters class-attribute instance-attribute #

Classes#

Parameters #

Attributes#

Functions#

read #

parameters `class-attribute` `instance-attribute` #

parameters `class-attribute` `instance-attribute` #

parameters `class-attribute` `instance-attribute` #

parameters `class-attribute` `instance-attribute` #

parameters `class-attribute` `instance-attribute` #

parameters `class-attribute` `instance-attribute` #