tsdat
¶
Framework for developing time-series data pipelines that are configurable through yaml configuration files and custom code hooks and components. Developed with Atmospheric, Oceanographic, and Renewable Energy domains in mind, but is generally applicable in other domains as well.
Subpackages¶
Submodules¶
Classes¶
DataHandler specifically tailored to reading and writing files of a specific type. |
|
Uses pandas and xarray functions to read a csv file and extract its contents into an |
|
Converts a xr.Dataset object to a pandas DataFrame and saves the result to a csv |
|
Checks the difference between consecutive values and reports a failure if the |
|
Checks that no values for the specified variable are greater than a specified |
|
Checks that no values for the specified variable are less than a specified minimum |
|
Checks that no values for the specified variable are greater than a specified |
|
Checks that no values for the specified variable are less than a specified minimum |
|
Checks if any data are missing. A variable’s data are considered missing if they are |
|
Checks if any values are not ordered strictly monotonically (i.e. values must all be |
|
Checks the difference between consecutive values and reports a failure if the |
|
Checks that no values for the specified variable are greater than a specified |
|
Checks that no values for the specified variable are less than a specified minimum |
|
Checks that no values for the specified variable are greater than a specified |
|
Checks that no values for the specified variable are less than a specified minimum |
|
Checks the difference between consecutive values and reports a failure if the |
|
Checks that no values for the specified variable are greater than a specified |
|
Checks that no values for the specified variable are less than a specified minimum |
|
Checks that no values for the specified variable are greater than a specified |
|
Checks that no values for the specified variable are less than a specified minimum |
|
Base class for running data conversions on retrieved raw dataset. |
|
Class that groups a DataReader subclass and a DataWriter subclass together to |
|
Base class for reading data from an input source. |
|
Base class for writing data to storage area(s). |
|
Class defining the structure and metadata of the dataset produced by a tsdat |
|
Reads data from one or more inputs, renames coordinates and data variables according |
|
Raises a DataQualityError, halting the pipeline, if the data quality are |
|
DataHandler specifically tailored to reading and writing files of a specific type. |
|
Handles data storage and retrieval for file-based data formats. Formats that write to |
|
Base class for file-based DataWriters. |
|
Pipeline class designed to read in raw, unstandardized time series data and enhance |
|
DataHandler specifically tailored to reading and writing files of a specific type. |
|
Thin wrapper around xarray’s open_dataset() function, with optional parameters |
|
Thin wrapper around xarray’s Dataset.to_netcdf() function for saving a dataset to a |
|
Base class for any class that accepts ‘parameters’ as an argument. Sets the default |
|
Base class for tsdat data pipelines. |
|
Class used to contain configuration parameters for tsdat pipelines. This class will |
|
Base class for code that checks the dataset / data variable quality. |
|
Class used to contain quality configuration parameters for tsdat pipelines. This |
|
Base class for code that handles the dataset / data variable quality. |
|
Main class for orchestrating the dispatch of QualityCheckers and QualityHandlers. |
|
Class that groups a single QualityChecker and one or more QualityHandlers so they |
|
Records the results of the quality check in an ancillary qc variable. Creates the |
|
Replaces all failed values with the variable’s _FillValue. If the variable does not |
|
Base class for retrieving data used as input to tsdat pipelines. |
|
Class used to contain configuration parameters for the tsdat retriever class. This |
|
Sorts the dataset by the failed variable, if there are any failures. |
|
Abstract base class for the tsdat Storage API. Subclasses of Storage are used in |
|
Class used to contain configuration parameters for tsdat pipelines. This class will |
|
Converts date strings into datetime64 data, accounting for the input format and |
|
Converts the units of a retrieved variable to the units specified by the variable’s |
|
Functions¶
Thin wrapper around xarray.assert_allclose which also checks dataset and variable |
|
Assigns the data to the specified variable in the dataset. |
|
Decodes the dataset according to CF conventions. This helps ensure that the dataset |
|
Returns a key consisting of the dataset’s datastream, starting date/time, the |
|
Gets the start date and start time strings from a Dataset. The strings are formatted |
|
Gets the earliest ‘time’ value and returns it as a pandas Timestamp, which resembles |
|
Records the message on the ‘corrections_applied’ attribute of the specified variable |
|
Recursively calls model.instantiate() on all ParameterizedConfigClass instances under |
Function Descriptions
-
exception
tsdat.
DataQualityError
[source]¶ Bases:
ValueError
Raised when the quality of a variable indicates a fatal error has occurred. Manual review of the data in question is often recommended in this case.
Initialize self. See help(type(self)) for accurate signature.
-
class
tsdat.
CSVHandler
[source]¶ Bases:
tsdat.io.base.FileHandler
DataHandler specifically tailored to reading and writing files of a specific type.
- Parameters
reader (DataReader) – The DataReader subclass responsible for handling input data
reading. –
writer (FileWriter) – The FileWriter subclass responsible for handling output
writing. (data) –
-
extension
:str = csv¶
-
reader
:tsdat.io.readers.CSVReader¶
-
writer
:tsdat.io.writers.CSVWriter¶
-
class
tsdat.
CSVReader
[source]¶ Bases:
tsdat.io.base.DataReader
Uses pandas and xarray functions to read a csv file and extract its contents into an xarray Dataset object. Two parameters are supported: read_csv_kwargs and from_dataframe_kwargs, whose contents are passed as keyword arguments to pandas.read_csv() and xarray.Dataset.from_dataframe() respectively.
-
class
Parameters
¶ Bases:
pydantic.BaseModel
-
from_dataframe_kwargs
:Dict[str, Any]¶
-
read_csv_kwargs
:Dict[str, Any]¶
-
-
parameters
:CSVReader.Parameters¶
Class Methods
Uses the input key to open a resource and load data as a xr.Dataset object or as
Method Descriptions
-
read
(self, input_key: str) → xarray.Dataset¶ Uses the input key to open a resource and load data as a xr.Dataset object or as a mapping of strings to xr.Dataset objects.
In most cases DataReaders will only need to return a single xr.Dataset, but occasionally some types of inputs necessitate that the data loaded from the input_key be returned as a mapping. For example, if the input_key is a path to a zip file containing multiple disparate datasets, then returning a mapping is appropriate.
- Parameters
input_key (str) – An input key matching the DataReader’s regex pattern that
be used to load data. (should) –
- Returns
The raw data extracted from the provided input key.
- Return type
Union[xr.Dataset, Dict[str, xr.Dataset]]
-
class
-
class
tsdat.
CSVWriter
[source]¶ Bases:
tsdat.io.base.FileWriter
Converts a xr.Dataset object to a pandas DataFrame and saves the result to a csv file using pd.DataFrame.to_csv(). Properties under the to_csv_kwargs parameter are passed to pd.DataFrame.to_csv() as keyword arguments.
-
class
Parameters
¶ Bases:
pydantic.BaseModel
-
dim_order
:Optional[List[str]]¶
-
to_csv_kwargs
:Dict[str, Any]¶
-
-
file_extension
:str = csv¶
-
parameters
:CSVWriter.Parameters¶
Class Methods
Writes the dataset to the provided filepath. This method is typically called by
Method Descriptions
-
write
(self, dataset: xarray.Dataset, filepath: Optional[pathlib.Path] = None) → None¶ Writes the dataset to the provided filepath. This method is typically called by the tsdat storage API, which will be responsible for providing the filepath, including the file extension.
- Parameters
dataset (xr.Dataset) – The dataset to save.
filepath (Optional[Path]) – The path to the file to save.
-
class
-
class
tsdat.
CheckFailDelta
[source]¶ Bases:
_CheckDelta
Checks the difference between consecutive values and reports a failure if the difference is less than the threshold specified by the value in the attribute provided to this check.
- Parameters
attribute_name (str) – The name of the attribute containing the threshold to use.
-
attribute_name
:str = fail_delta¶
-
class
tsdat.
CheckFailMax
[source]¶ Bases:
_CheckMax
Checks that no values for the specified variable are greater than a specified threshold. The value of the threshold is specified by an attribute on each data variable, and the attribute to search for is specified as a property of this base class.
If the specified attribute does not exist on the variable being checked then no failures will be reported.
- Parameters
attribute_name (str) – The name of the attribute containing the maximum
If the attribute ends in '_range' then it is assumed to be a list (threshold.) –
:param : :param and the first value from the list will be used as the minimum threshold.: :param allow_equal: True if values equal to the threshold should pass the check, :type allow_equal: bool :param False otherwise.:
-
attribute_name
:str = fail_max¶
-
class
tsdat.
CheckFailMin
[source]¶ Bases:
_CheckMin
Checks that no values for the specified variable are less than a specified minimum threshold. The value of the threshold is specified by an attribute on each data variable, and the attribute to search for is specified as a property of this base class.
If the specified attribute does not exist on the variable being checked then no failures will be reported.
- Parameters
attribute_name (str) – The name of the attribute containing the minimum
If the attribute ends in '_range' then it is assumed to be a list (threshold.) –
:param : :param and the first value from the list will be used as the minimum threshold.: :param allow_equal: True if values equal to the threshold should pass the check, :type allow_equal: bool :param False otherwise.:
-
attribute_name
:str = fail_min¶
-
class
tsdat.
CheckFailRangeMax
[source]¶ Bases:
_CheckMax
Checks that no values for the specified variable are greater than a specified threshold. The value of the threshold is specified by an attribute on each data variable, and the attribute to search for is specified as a property of this base class.
If the specified attribute does not exist on the variable being checked then no failures will be reported.
- Parameters
attribute_name (str) – The name of the attribute containing the maximum
If the attribute ends in '_range' then it is assumed to be a list (threshold.) –
:param : :param and the first value from the list will be used as the minimum threshold.: :param allow_equal: True if values equal to the threshold should pass the check, :type allow_equal: bool :param False otherwise.:
-
attribute_name
:str = fail_range¶
-
class
tsdat.
CheckFailRangeMin
[source]¶ Bases:
_CheckMin
Checks that no values for the specified variable are less than a specified minimum threshold. The value of the threshold is specified by an attribute on each data variable, and the attribute to search for is specified as a property of this base class.
If the specified attribute does not exist on the variable being checked then no failures will be reported.
- Parameters
attribute_name (str) – The name of the attribute containing the minimum
If the attribute ends in '_range' then it is assumed to be a list (threshold.) –
:param : :param and the first value from the list will be used as the minimum threshold.: :param allow_equal: True if values equal to the threshold should pass the check, :type allow_equal: bool :param False otherwise.:
-
attribute_name
:str = fail_range¶
-
class
tsdat.
CheckMissing
[source]¶ Bases:
tsdat.qc.base.QualityChecker
Checks if any data are missing. A variable’s data are considered missing if they are set to the variable’s _FillValue (if it has a _FillValue) or NaN (NaT for datetime- like variables).
Class Methods
Checks the quality of a specific variable in the dataset and returns the results
Method Descriptions
-
run
(self, dataset: xarray.Dataset, variable_name: str) → numpy.typing.NDArray[numpy.bool8]¶ Checks the quality of a specific variable in the dataset and returns the results of the check as a boolean array where True values represent quality problems and False values represent data that passes the quality check.
QualityCheckers should not modify dataset variables; changes to the dataset should be made by QualityHandler(s), which receive the results of a QualityChecker as input.
- Parameters
dataset (xr.Dataset) – The dataset containing the variable to check.
variable_name (str) – The name of the variable to check.
- Returns
The results of the quality check, where True values indicate a quality problem.
- Return type
NDArray[np.bool8]
-
-
class
tsdat.
CheckMonotonic
[source]¶ Bases:
tsdat.qc.base.QualityChecker
Checks if any values are not ordered strictly monotonically (i.e. values must all be increasing or all decreasing). The check marks all values as failed if any data values are not ordered monotonically.
-
class
Parameters
¶ Bases:
pydantic.BaseModel
-
dim
:Optional[str]¶
-
require_decreasing
:bool = False¶
-
require_increasing
:bool = False¶
Class Methods
Method Descriptions
-
classmethod
check_monotonic_not_increasing_and_decreasing
(cls, inc: bool, values: Dict[str, Any]) → bool¶
-
-
parameters
:CheckMonotonic.Parameters¶
Class Methods
Checks the quality of a specific variable in the dataset and returns the results
Method Descriptions
-
get_axis
(self, variable: xarray.DataArray) → int¶
-
run
(self, dataset: xarray.Dataset, variable_name: str) → numpy.typing.NDArray[numpy.bool8]¶ Checks the quality of a specific variable in the dataset and returns the results of the check as a boolean array where True values represent quality problems and False values represent data that passes the quality check.
QualityCheckers should not modify dataset variables; changes to the dataset should be made by QualityHandler(s), which receive the results of a QualityChecker as input.
- Parameters
dataset (xr.Dataset) – The dataset containing the variable to check.
variable_name (str) – The name of the variable to check.
- Returns
The results of the quality check, where True values indicate a quality problem.
- Return type
NDArray[np.bool8]
-
class
-
class
tsdat.
CheckValidDelta
[source]¶ Bases:
_CheckDelta
Checks the difference between consecutive values and reports a failure if the difference is less than the threshold specified by the value in the attribute provided to this check.
- Parameters
attribute_name (str) – The name of the attribute containing the threshold to use.
-
attribute_name
:str = valid_delta¶
-
class
tsdat.
CheckValidMax
[source]¶ Bases:
_CheckMax
Checks that no values for the specified variable are greater than a specified threshold. The value of the threshold is specified by an attribute on each data variable, and the attribute to search for is specified as a property of this base class.
If the specified attribute does not exist on the variable being checked then no failures will be reported.
- Parameters
attribute_name (str) – The name of the attribute containing the maximum
If the attribute ends in '_range' then it is assumed to be a list (threshold.) –
:param : :param and the first value from the list will be used as the minimum threshold.: :param allow_equal: True if values equal to the threshold should pass the check, :type allow_equal: bool :param False otherwise.:
-
attribute_name
:str = valid_max¶
-
class
tsdat.
CheckValidMin
[source]¶ Bases:
_CheckMin
Checks that no values for the specified variable are less than a specified minimum threshold. The value of the threshold is specified by an attribute on each data variable, and the attribute to search for is specified as a property of this base class.
If the specified attribute does not exist on the variable being checked then no failures will be reported.
- Parameters
attribute_name (str) – The name of the attribute containing the minimum
If the attribute ends in '_range' then it is assumed to be a list (threshold.) –
:param : :param and the first value from the list will be used as the minimum threshold.: :param allow_equal: True if values equal to the threshold should pass the check, :type allow_equal: bool :param False otherwise.:
-
attribute_name
:str = valid_min¶
-
class
tsdat.
CheckValidRangeMax
[source]¶ Bases:
_CheckMax
Checks that no values for the specified variable are greater than a specified threshold. The value of the threshold is specified by an attribute on each data variable, and the attribute to search for is specified as a property of this base class.
If the specified attribute does not exist on the variable being checked then no failures will be reported.
- Parameters
attribute_name (str) – The name of the attribute containing the maximum
If the attribute ends in '_range' then it is assumed to be a list (threshold.) –
:param : :param and the first value from the list will be used as the minimum threshold.: :param allow_equal: True if values equal to the threshold should pass the check, :type allow_equal: bool :param False otherwise.:
-
attribute_name
:str = valid_range¶
-
class
tsdat.
CheckValidRangeMin
[source]¶ Bases:
_CheckMin
Checks that no values for the specified variable are less than a specified minimum threshold. The value of the threshold is specified by an attribute on each data variable, and the attribute to search for is specified as a property of this base class.
If the specified attribute does not exist on the variable being checked then no failures will be reported.
- Parameters
attribute_name (str) – The name of the attribute containing the minimum
If the attribute ends in '_range' then it is assumed to be a list (threshold.) –
:param : :param and the first value from the list will be used as the minimum threshold.: :param allow_equal: True if values equal to the threshold should pass the check, :type allow_equal: bool :param False otherwise.:
-
attribute_name
:str = valid_range¶
-
class
tsdat.
CheckWarnDelta
[source]¶ Bases:
_CheckDelta
Checks the difference between consecutive values and reports a failure if the difference is less than the threshold specified by the value in the attribute provided to this check.
- Parameters
attribute_name (str) – The name of the attribute containing the threshold to use.
-
attribute_name
:str = warn_delta¶
-
class
tsdat.
CheckWarnMax
[source]¶ Bases:
_CheckMax
Checks that no values for the specified variable are greater than a specified threshold. The value of the threshold is specified by an attribute on each data variable, and the attribute to search for is specified as a property of this base class.
If the specified attribute does not exist on the variable being checked then no failures will be reported.
- Parameters
attribute_name (str) – The name of the attribute containing the maximum
If the attribute ends in '_range' then it is assumed to be a list (threshold.) –
:param : :param and the first value from the list will be used as the minimum threshold.: :param allow_equal: True if values equal to the threshold should pass the check, :type allow_equal: bool :param False otherwise.:
-
attribute_name
:str = warn_max¶
-
class
tsdat.
CheckWarnMin
[source]¶ Bases:
_CheckMin
Checks that no values for the specified variable are less than a specified minimum threshold. The value of the threshold is specified by an attribute on each data variable, and the attribute to search for is specified as a property of this base class.
If the specified attribute does not exist on the variable being checked then no failures will be reported.
- Parameters
attribute_name (str) – The name of the attribute containing the minimum
If the attribute ends in '_range' then it is assumed to be a list (threshold.) –
:param : :param and the first value from the list will be used as the minimum threshold.: :param allow_equal: True if values equal to the threshold should pass the check, :type allow_equal: bool :param False otherwise.:
-
attribute_name
:str = warn_min¶
-
class
tsdat.
CheckWarnRangeMax
[source]¶ Bases:
_CheckMax
Checks that no values for the specified variable are greater than a specified threshold. The value of the threshold is specified by an attribute on each data variable, and the attribute to search for is specified as a property of this base class.
If the specified attribute does not exist on the variable being checked then no failures will be reported.
- Parameters
attribute_name (str) – The name of the attribute containing the maximum
If the attribute ends in '_range' then it is assumed to be a list (threshold.) –
:param : :param and the first value from the list will be used as the minimum threshold.: :param allow_equal: True if values equal to the threshold should pass the check, :type allow_equal: bool :param False otherwise.:
-
attribute_name
:str = warn_range¶
-
class
tsdat.
CheckWarnRangeMin
[source]¶ Bases:
_CheckMin
Checks that no values for the specified variable are less than a specified minimum threshold. The value of the threshold is specified by an attribute on each data variable, and the attribute to search for is specified as a property of this base class.
If the specified attribute does not exist on the variable being checked then no failures will be reported.
- Parameters
attribute_name (str) – The name of the attribute containing the minimum
If the attribute ends in '_range' then it is assumed to be a list (threshold.) –
:param : :param and the first value from the list will be used as the minimum threshold.: :param allow_equal: True if values equal to the threshold should pass the check, :type allow_equal: bool :param False otherwise.:
-
attribute_name
:str = warn_range¶
-
class
tsdat.
DataConverter
[source]¶ Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Base class for running data conversions on retrieved raw dataset.
Class Methods
Runs the data converter on the provided (retrieved) dataset.
Method Descriptions
-
abstract
convert
(self, dataset: xarray.Dataset, dataset_config: tsdat.config.dataset.DatasetConfig, variable_name: str, **kwargs: Any) → xarray.Dataset¶ Runs the data converter on the provided (retrieved) dataset.
- Parameters
dataset (xr.Dataset) – The dataset to convert.
dataset_config (DatasetConfig) – The dataset configuration.
variable_name (str) – The name of the variable to convert.
- Returns
The converted dataset.
- Return type
xr.Dataset
-
abstract
-
class
tsdat.
DataHandler
[source]¶ Bases:
tsdat.utils.ParameterizedClass
Class that groups a DataReader subclass and a DataWriter subclass together to provide a unified approach to data I/O.
- Parameters
reader (DataReader) – The DataReader subclass responsible for handling input data
reading. –
writer (FileWriter) – The FileWriter subclass responsible for handling output
writing. (data) –
-
parameters
:Any¶
-
reader
:DataReader¶
-
writer
:DataWriter¶
-
class
tsdat.
DataReader
[source]¶ Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Base class for reading data from an input source.
- Parameters
regex (Pattern[str]) – The regex pattern associated with the DataReader. If
the DataReader from a tsdat pipeline (calling) –
pattern will be checked (this) –
each possible input key before the read() method is called. (against) –
Class Methods
Uses the input key to open a resource and load data as a xr.Dataset object or as
Method Descriptions
-
abstract
read
(self, input_key: str) → Union[xarray.Dataset, Dict[str, xarray.Dataset]]¶ Uses the input key to open a resource and load data as a xr.Dataset object or as a mapping of strings to xr.Dataset objects.
In most cases DataReaders will only need to return a single xr.Dataset, but occasionally some types of inputs necessitate that the data loaded from the input_key be returned as a mapping. For example, if the input_key is a path to a zip file containing multiple disparate datasets, then returning a mapping is appropriate.
- Parameters
input_key (str) – An input key matching the DataReader’s regex pattern that
be used to load data. (should) –
- Returns
The raw data extracted from the provided input key.
- Return type
Union[xr.Dataset, Dict[str, xr.Dataset]]
-
class
tsdat.
DataWriter
[source]¶ Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Base class for writing data to storage area(s).
Class Methods
Writes the dataset to the storage area. This method is typically called by
Method Descriptions
-
abstract
write
(self, dataset: xarray.Dataset, **kwargs: Any) → None¶ Writes the dataset to the storage area. This method is typically called by the tsdat storage API, which will be responsible for providing any additional parameters required by subclasses of the tsdat.io.base.DataWriter class.
- Parameters
dataset (xr.Dataset) – The dataset to save.
-
abstract
-
class
tsdat.
DatasetConfig
[source]¶ Bases:
tsdat.config.utils.YamlModel
Class defining the structure and metadata of the dataset produced by a tsdat pipeline.
Also provides methods to support yaml parsing and validation, including generation of json schema.
- Parameters
attrs (GlobalAttributes) – Attributes that pertain to the dataset as a whole.
coords (Dict[str, Coordinate]) – The dataset’s coordinate variables.
data_vars (Dict[str, Variable]) – The dataset’s data variables.
-
attrs
:tsdat.config.attributes.GlobalAttributes¶
-
coords
:Dict[VarName, tsdat.config.variables.Coordinate]¶
-
data_vars
:Dict[VarName, tsdat.config.variables.Variable]¶
Class Methods
Method Descriptions
-
__contains__
(self, __o: object) → bool¶
-
__getitem__
(self, name: str) → Union[tsdat.config.variables.Variable, tsdat.config.variables.Coordinate]¶
-
classmethod
set_variable_name_property
(cls, vars: Dict[str, Dict[str, Any]]) → Dict[str, Dict[str, Any]]¶
-
classmethod
time_in_coords
(cls, coords: Dict[VarName, tsdat.config.variables.Coordinate]) → Dict[VarName, tsdat.config.variables.Coordinate]¶
-
classmethod
validate_variable_name_uniqueness
(cls, values: Any) → Any¶
-
variable_names_are_legal
(cls, vars: Dict[str, tsdat.config.variables.Variable], field: pydantic.fields.ModelField) → Dict[str, tsdat.config.variables.Variable]¶
-
class
tsdat.
DefaultRetriever
[source]¶ Bases:
tsdat.io.base.Retriever
Reads data from one or more inputs, renames coordinates and data variables according to retrieval and dataset configurations, and applies registered DataConverters to retrieved data.
- Parameters
readers (Dict[Pattern[str], DataReader]) – A mapping of patterns to DataReaders
the retriever uses to determine which DataReader to use for reading any (that) –
input key. (given) –
coords (Dict[str, Dict[Pattern[str], VariableRetriever]]) – A dictionary mapping
coordinate variable names to rules for how they should be retrieved. (output) –
data_vars (Dict[str, Dict[Pattern[str], VariableRetriever]]) – A dictionary
output data variable names to rules for how they should be retrieved. (mapping) –
-
class
Parameters
¶ Bases:
pydantic.BaseModel
-
merge_kwargs
:Dict[str, Any]¶ Keyword arguments passed to xr.merge(). This is only relevant if multiple input keys are provided simultaneously, or if any registered DataReader objects could return a dataset mapping instead of a single dataset.
-
-
coords
:Dict[str, Dict[Pattern, VariableRetriever]]¶ A dictionary mapping output coordinate names to the retrieval rules and preprocessing actions (e.g., DataConverters) that should be applied to each retrieved coordinate variable.
-
data_vars
:Dict[str, Dict[Pattern, VariableRetriever]]¶ A dictionary mapping output data variable names to the retrieval rules and preprocessing actions (e.g., DataConverters) that should be applied to each retrieved data variable.
-
parameters
:DefaultRetriever.Parameters¶
-
readers
:Dict[Pattern, tsdat.io.base.DataReader]¶ A dictionary of DataReaders that should be used to read data provided an input key.
Class Methods
Prepares the raw dataset mapping for use in downstream pipeline processes by
Method Descriptions
-
retrieve
(self, input_keys: List[str], dataset_config: tsdat.config.dataset.DatasetConfig, **kwargs: Any) → xarray.Dataset¶ Prepares the raw dataset mapping for use in downstream pipeline processes by consolidating the data into a single xr.Dataset object. The retrieved dataset may contain additional coords and data_vars that are not defined in the output dataset. Input data converters are applied as part of the preparation process.
- Parameters
input_keys (List[str]) – The input keys the registered DataReaders should
from. (read) –
dataset_config (DatasetConfig) – The specification of the output dataset.
- Returns
The retrieved dataset.
- Return type
xr.Dataset
-
class
tsdat.
FailPipeline
[source]¶ Bases:
tsdat.qc.base.QualityHandler
Raises a DataQualityError, halting the pipeline, if the data quality are sufficiently bad. This usually indicates that a manual inspection of the data is recommended.
- Raises
DataQualityError – DataQualityError
-
class
Parameters
¶ Bases:
pydantic.BaseModel
-
context
:str =¶ Additional context set by users that ends up in the traceback message.
-
tolerance
:float = 0¶ Tolerance for the number of allowable failures as the ratio of allowable failures to the total number of values checked. Defaults to 0, meaning that any failed checks will result in a DataQualityError being raised.
-
-
parameters
:FailPipeline.Parameters¶
Class Methods
Handles the quality of a variable in the dataset and returns the dataset after
Method Descriptions
-
run
(self, dataset: xarray.Dataset, variable_name: str, failures: numpy.typing.NDArray[numpy.bool8])¶ Handles the quality of a variable in the dataset and returns the dataset after any corrections have been applied.
- Parameters
dataset (xr.Dataset) – The dataset containing the variable to handle.
variable_name (str) – The name of the variable whose quality should be
handled. –
failures (NDArray[np.bool8]) – The results of the QualityChecker for the
variable (provided) –
True values indicate a quality problem. (where) –
- Returns
The dataset after the QualityHandler has been run.
- Return type
xr.Dataset
-
class
tsdat.
FileHandler
[source]¶ Bases:
DataHandler
DataHandler specifically tailored to reading and writing files of a specific type.
- Parameters
reader (DataReader) – The DataReader subclass responsible for handling input data
reading. –
writer (FileWriter) – The FileWriter subclass responsible for handling output
writing. (data) –
-
reader
:DataReader¶
-
writer
:FileWriter¶
-
class
tsdat.
FileSystem
[source]¶ Bases:
tsdat.io.base.Storage
Handles data storage and retrieval for file-based data formats. Formats that write to directories (such as zarr) are not supported by the FileSystem storage class.
- Parameters
parameters (Parameters) – File-system specific parameters, such as the root path to
files should be saved (where) –
additional keyword arguments to specific (or) –
used by the storage API. See the FileSystemStorage.Parameters class for (functions) –
details. (more) –
handler (FileHandler) – The FileHandler class that should be used to handle data
within the storage API. (I/O) –
-
class
Parameters
¶ Bases:
pydantic.BaseSettings
-
file_timespan
:Optional[str]¶
-
merge_fetched_data_kwargs
:Dict[str, Any]¶
-
storage_root
:pathlib.Path¶ The path on disk where data and ancillary files will be saved to. Defaults to the storage/root folder in the active working directory. The directory is created as this parameter is set, if the directory does not already exist.
-
-
handler
:tsdat.io.handlers.FileHandler¶
-
parameters
:FileSystem.Parameters¶
Class Methods
Fetches data for a given datastream between a specified time range.
Saves an ancillary filepath to the datastream’s ancillary storage area.
Saves a dataset to the storage area. At a minimum, the dataset must have a ‘datastream’
Method Descriptions
-
fetch_data
(self, start: datetime.datetime, end: datetime.datetime, datastream: str) → xarray.Dataset¶ Fetches data for a given datastream between a specified time range.
Note: this method is not smart; it searches for the appropriate data files using their filenames and does not filter within each data file.
- Parameters
start (datetime) – The minimum datetime to fetch.
end (datetime) – The maximum datetime to fetch.
datastream (str) – The datastream id to search for.
- Returns
A dataset containing all the data in the storage area that spans the specified datetimes.
- Return type
xr.Dataset
-
save_ancillary_file
(self, filepath: pathlib.Path, datastream: str)¶ Saves an ancillary filepath to the datastream’s ancillary storage area.
- Parameters
filepath (Path) – The path to the ancillary file.
datastream (str) – The datastream that the file is related to.
-
save_data
(self, dataset: xarray.Dataset)¶ Saves a dataset to the storage area. At a minimum, the dataset must have a ‘datastream’ global attribute and must have a ‘time’ variable with a np.datetime64-like data type.
- Parameters
dataset (xr.Dataset) – The dataset to save.
-
class
tsdat.
FileWriter
[source]¶ Bases:
DataWriter
,abc.ABC
Base class for file-based DataWriters.
- Parameters
file_extension (str) – The file extension that the FileHandler should be used
for –
e.g. –
".nc" –
".csv" –
.. –
-
file_extension
:str¶
Class Methods
Writes the dataset to the provided filepath. This method is typically called by
Method Descriptions
-
abstract
write
(self, dataset: xarray.Dataset, filepath: Optional[pathlib.Path] = None, **kwargs: Any) → None¶ Writes the dataset to the provided filepath. This method is typically called by the tsdat storage API, which will be responsible for providing the filepath, including the file extension.
- Parameters
dataset (xr.Dataset) – The dataset to save.
filepath (Optional[Path]) – The path to the file to save.
-
class
tsdat.
IngestPipeline
[source]¶ Bases:
tsdat.pipeline.base.Pipeline
Pipeline class designed to read in raw, unstandardized time series data and enhance its quality and usability by converting it into a standard format, embedding metadata, applying quality checks and controls, generating reference plots, and saving the data in an accessible format so it can be used later in scientific analyses or in higher-level tsdat Pipelines.
Class Methods
User-overrideable code hook that runs after the retriever has retrieved the
User-overrideable code hook that runs after the dataset quality has been managed
User-overrideable code hook that runs after the dataset has been saved by the
Runs the data pipeline on the provided inputs.
Method Descriptions
-
hook_customize_dataset
(self, dataset: xarray.Dataset) → xarray.Dataset¶ User-overrideable code hook that runs after the retriever has retrieved the dataset from the specified input keys, but before the pipeline has applied any quality checks or corrections to the dataset.
- Parameters
dataset (xr.Dataset) – The output dataset structure returned by the retriever
API. –
- Returns
The customized dataset.
- Return type
xr.Dataset
-
hook_finalize_dataset
(self, dataset: xarray.Dataset) → xarray.Dataset¶ User-overrideable code hook that runs after the dataset quality has been managed but before the dataset has been sent to the storage API to be saved.
- Parameters
dataset (xr.Dataset) – The output dataset returned by the retriever API and
by the hook_customize_retrieved_dataset user code hook. (modified) –
- Returns
The finalized dataset, ready to be saved.
- Return type
xr.Dataset
-
hook_plot_dataset
(self, dataset: xarray.Dataset)¶ User-overrideable code hook that runs after the dataset has been saved by the storage API.
- Parameters
dataset (xr.Dataset) – The dataset to plot.
-
run
(self, inputs: List[str], **kwargs: Any) → xarray.Dataset¶ Runs the data pipeline on the provided inputs.
- Parameters
inputs (List[str]) – A list of input keys that the pipeline’s Retriever class
use to load data into the pipeline. (can) –
- Returns
The processed dataset.
- Return type
xr.Dataset
-
-
class
tsdat.
NetCDFHandler
[source]¶ Bases:
tsdat.io.base.FileHandler
DataHandler specifically tailored to reading and writing files of a specific type.
- Parameters
reader (DataReader) – The DataReader subclass responsible for handling input data
reading. –
writer (FileWriter) – The FileWriter subclass responsible for handling output
writing. (data) –
-
extension
:str = nc¶
-
reader
:tsdat.io.readers.NetCDFReader¶
-
writer
:tsdat.io.writers.NetCDFWriter¶
-
class
tsdat.
NetCDFReader
[source]¶ Bases:
tsdat.io.base.DataReader
Thin wrapper around xarray’s open_dataset() function, with optional parameters used as keyword arguments in the function call.
-
parameters
:Dict[str, Any]¶
Class Methods
Uses the input key to open a resource and load data as a xr.Dataset object or as
Method Descriptions
-
read
(self, input_key: str) → xarray.Dataset¶ Uses the input key to open a resource and load data as a xr.Dataset object or as a mapping of strings to xr.Dataset objects.
In most cases DataReaders will only need to return a single xr.Dataset, but occasionally some types of inputs necessitate that the data loaded from the input_key be returned as a mapping. For example, if the input_key is a path to a zip file containing multiple disparate datasets, then returning a mapping is appropriate.
- Parameters
input_key (str) – An input key matching the DataReader’s regex pattern that
be used to load data. (should) –
- Returns
The raw data extracted from the provided input key.
- Return type
Union[xr.Dataset, Dict[str, xr.Dataset]]
-
-
class
tsdat.
NetCDFWriter
[source]¶ Bases:
tsdat.io.base.FileWriter
Thin wrapper around xarray’s Dataset.to_netcdf() function for saving a dataset to a netCDF file. Properties under the to_netcdf_kwargs parameter will be passed to Dataset.to_netcdf() as keyword arguments.
File compression is used by default to save disk space. To disable compression set the use_compression parameter to False.
-
class
Parameters
¶ Bases:
pydantic.BaseModel
-
compression_kwargs
:Dict[str, Any]¶
-
to_netcdf_kwargs
:Dict[str, Any]¶
-
use_compression
:bool = True¶
-
-
file_extension
:str = nc¶
-
parameters
:NetCDFWriter.Parameters¶
Class Methods
Writes the dataset to the provided filepath. This method is typically called by
Method Descriptions
-
write
(self, dataset: xarray.Dataset, filepath: Optional[pathlib.Path] = None) → None¶ Writes the dataset to the provided filepath. This method is typically called by the tsdat storage API, which will be responsible for providing the filepath, including the file extension.
- Parameters
dataset (xr.Dataset) – The dataset to save.
filepath (Optional[Path]) – The path to the file to save.
-
class
-
class
tsdat.
ParameterizedClass
[source]¶ Bases:
pydantic.BaseModel
Base class for any class that accepts ‘parameters’ as an argument. Sets the default ‘parameters’ to {}. Subclasses of ParameterizedClass should override the ‘parameters’ properties to support custom required or optional arguments from configuration files.
-
parameters
:Any¶
-
-
class
tsdat.
ParameterizedConfigClass
[source]¶ Bases:
pydantic.BaseModel
-
classname
:pydantic.StrictStr¶
-
parameters
:Dict[str, Any]¶
Class Methods
Instantiates and returns the class specified by the ‘classname’ parameter.
Method Descriptions
-
classmethod
classname_looks_like_a_module
(cls, v: pydantic.StrictStr) → pydantic.StrictStr¶
-
instantiate
(self) → Any¶ Instantiates and returns the class specified by the ‘classname’ parameter.
- Returns
An instance of the specified class.
- Return type
Any
-
-
class
tsdat.
Pipeline
[source]¶ Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Base class for tsdat data pipelines.
-
dataset_config
:tsdat.config.dataset.DatasetConfig¶ Describes the structure and metadata of the output dataset.
-
quality
:tsdat.qc.base.QualityManagement¶ Manages the dataset quality through checks and corrections.
-
retriever
:tsdat.io.base.Retriever¶ Retrieves data from input keys.
-
settings
:Any¶
-
storage
:tsdat.io.base.Storage¶ Stores the dataset so it can be retrieved later.
-
triggers
:List[Pattern] = []¶ Regex patterns matching input keys to determine when the pipeline should run.
Class Methods
Modifies the retrieved dataset by dropping variables not declared in the
Runs the data pipeline on the provided inputs.
Method Descriptions
-
prepare_retrieved_dataset
(self, dataset: xarray.Dataset) → xarray.Dataset¶ Modifies the retrieved dataset by dropping variables not declared in the DatasetConfig, adding static variables, initializing non-retrieved variables, and importing global and variable-level attributes from the DatasetConfig.
- Parameters
dataset (xr.Dataset) – The retrieved dataset.
- Returns
The dataset with structure and metadata matching the DatasetConfig.
- Return type
xr.Dataset
-
abstract
run
(self, inputs: List[str], **kwargs: Any) → Any¶ Runs the data pipeline on the provided inputs.
- Parameters
inputs (List[str]) – A list of input keys that the pipeline’s Retriever class
use to load data into the pipeline. (can) –
- Returns
The processed dataset.
- Return type
xr.Dataset
-
-
class
tsdat.
PipelineConfig
[source]¶ Bases:
tsdat.config.utils.ParameterizedConfigClass
,tsdat.config.utils.YamlModel
Class used to contain configuration parameters for tsdat pipelines. This class will ultimately be converted into a tsdat.pipeline.base.Pipeline subclass for use in tsdat pipelines.
Provides methods to support yaml parsing and validation, including the generation of json schema for immediate validation. This class also provides a method to instantiate a tsdat.pipeline.base.Pipeline subclass from a parsed configuration file.
- Parameters
classname (str) – The dotted module path to the pipeline that the specified
should apply to. To use the built-in IngestPipeline (configurations) –
example (for) –
:param : :param you would set ‘tsdat.pipeline.pipelines.IngestPipeline’ as the classname.: :param triggers: A list of regex patterns that should trigger this :type triggers: List[Pattern[str]] :param pipeline when matched with an input key.: :param retriever: Either the :type retriever: Union[Overrideable[RetrieverConfig], RetrieverConfig] :param path to the retriever configuration yaml file and any overrides that should be: :param applied: :param or the retriever configurations themselves.: :param dataset: Either the path to :type dataset: Union[Overrideable[DatasetConfig], DatasetConfig] :param the dataset configuration yaml file and any overrides that should be applied: :param or: :param the dataset configurations themselves.: :param quality: Either the path to :type quality: Union[Overrideable[QualityConfig], QualityConfig] :param the quality configuration yaml file and any overrides that should be applied: :param or: :param the dataset configurations themselves.: :param storage: Either the path to :type storage: Union[Overrideable[StorageConfig], StorageConfig] :param the storage configuration yaml file and any overrides that should be applied: :param or: :param the storage configurations themselves.:
-
dataset
:Union[tsdat.config.utils.Overrideable[tsdat.config.dataset.DatasetConfig], tsdat.config.dataset.DatasetConfig]¶
-
quality
:Union[tsdat.config.utils.Overrideable[tsdat.config.quality.QualityConfig], tsdat.config.quality.QualityConfig]¶
-
retriever
:Union[tsdat.config.utils.Overrideable[tsdat.config.retriever.RetrieverConfig], tsdat.config.retriever.RetrieverConfig]¶
-
storage
:Union[tsdat.config.utils.Overrideable[tsdat.config.storage.StorageConfig], tsdat.config.storage.StorageConfig]¶
-
triggers
:List[Pattern]¶
Class Methods
This method instantiates the tsdat.pipeline.BasePipeline subclass referenced by the
Method Descriptions
-
instantiate_pipeline
(self) → tsdat.pipeline.base.Pipeline¶ This method instantiates the tsdat.pipeline.BasePipeline subclass referenced by the classname property on the PipelineConfig instance and passes all properties on the PipelineConfig class (except for ‘classname’) as keyword arguments to the constructor of the tsdat.pipeline.BasePipeline subclass.
Properties and sub-properties of the PipelineConfig class that are subclasses of tsdat.config.utils.ParameterizedConfigClass (e.g, classes that define a ‘classname’ and optional ‘parameters’ properties) will also be instantiated in similar fashion. See tsdat.config.utils.recursive_instantiate for implementation details.
- Returns
An instance of a tsdat.pipeline.base.Pipeline subclass.
- Return type
-
classmethod
merge_overrideable_yaml
(cls, v: Dict[str, Any], values: Dict[str, Any], field: pydantic.fields.ModelField)¶
-
class
tsdat.
QualityChecker
[source]¶ Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Base class for code that checks the dataset / data variable quality.
Class Methods
Checks the quality of a specific variable in the dataset and returns the results
Method Descriptions
-
abstract
run
(self, dataset: xarray.Dataset, variable_name: str) → numpy.typing.NDArray[numpy.bool8]¶ Checks the quality of a specific variable in the dataset and returns the results of the check as a boolean array where True values represent quality problems and False values represent data that passes the quality check.
QualityCheckers should not modify dataset variables; changes to the dataset should be made by QualityHandler(s), which receive the results of a QualityChecker as input.
- Parameters
dataset (xr.Dataset) – The dataset containing the variable to check.
variable_name (str) – The name of the variable to check.
- Returns
The results of the quality check, where True values indicate a quality problem.
- Return type
NDArray[np.bool8]
-
abstract
-
class
tsdat.
QualityConfig
[source]¶ Bases:
tsdat.config.utils.YamlModel
Class used to contain quality configuration parameters for tsdat pipelines. This class will ultimately be converted into a tsdat.qc.base.QualityManagement class for use in downstream tsdat pipeline code.
Provides methods to support yaml parsing and validation, including the generation of json schema for immediate validation.
- Parameters
managers (List[ManagerConfig]) – A list of quality checks and controls that
be applied. (should) –
-
managers
:List[ManagerConfig]¶
Class Methods
Method Descriptions
-
classmethod
validate_manager_names_are_unique
(cls, v: List[ManagerConfig]) → List[ManagerConfig]¶
-
class
tsdat.
QualityHandler
[source]¶ Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Base class for code that handles the dataset / data variable quality.
Class Methods
Handles the quality of a variable in the dataset and returns the dataset after
Method Descriptions
-
abstract
run
(self, dataset: xarray.Dataset, variable_name: str, failures: numpy.typing.NDArray[numpy.bool8]) → xarray.Dataset¶ Handles the quality of a variable in the dataset and returns the dataset after any corrections have been applied.
- Parameters
dataset (xr.Dataset) – The dataset containing the variable to handle.
variable_name (str) – The name of the variable whose quality should be
handled. –
failures (NDArray[np.bool8]) – The results of the QualityChecker for the
variable (provided) –
True values indicate a quality problem. (where) –
- Returns
The dataset after the QualityHandler has been run.
- Return type
xr.Dataset
-
abstract
-
class
tsdat.
QualityManagement
[source]¶ Bases:
pydantic.BaseModel
Main class for orchestrating the dispatch of QualityCheckers and QualityHandlers.
- Parameters
managers (List[QualityManager]) – The list of QualityManagers that should be run.
-
managers
:List[QualityManager]¶
Class Methods
Runs the registered QualityManagers on the dataset.
Method Descriptions
-
manage
(self, dataset: xarray.Dataset) → xarray.Dataset¶ Runs the registered QualityManagers on the dataset.
- Parameters
dataset (xr.Dataset) – The dataset to apply quality checks and controls to.
- Returns
The quality-checked dataset.
- Return type
xr.Dataset
-
class
tsdat.
QualityManager
[source]¶ Bases:
pydantic.BaseModel
Class that groups a single QualityChecker and one or more QualityHandlers so they can be dispatched together.
- Parameters
name (str) – The name of the quality manager
checker (QualityChecker) – The quality check that should be run.
handlers (QualityHandler) – One or more QualityHandlers that should be run given
results of the checker. (the) –
apply_to (List[str]) – A list of variables that the check should run for. Accepts
of 'COORDS' or 'DATA_VARS' (keywords) –
any number of specific variables that (or) –
be run. (should) –
exclude (List[str]) – A list of variables that the check should exclude. Accepts
same keywords as apply_to. (the) –
-
apply_to
:List[str]¶
-
checker
:QualityChecker¶
-
exclude
:List[str] = []¶
-
handlers
:List[QualityHandler]¶
-
name
:str¶
Class Methods
Runs the quality manager on the dataset.
Method Descriptions
-
run
(self, dataset: xarray.Dataset) → xarray.Dataset¶ Runs the quality manager on the dataset.
- Parameters
dataset (xr.Dataset) – The dataset to apply quality checks / controls to.
- Returns
The dataset after the quality check and controls have been applied.
- Return type
xr.Dataset
-
class
tsdat.
RecordQualityResults
[source]¶ Bases:
tsdat.qc.base.QualityHandler
Records the results of the quality check in an ancillary qc variable. Creates the ancillary qc variable if one does not already exist.
-
class
Parameters
¶ Bases:
pydantic.BaseModel
-
assessment
:Literal[bad, indeterminate]¶ Indicates the quality of the data if the test results indicate a failure.
-
bit
:int¶ The bit number (e.g., 1, 2, 3, …) used to indicate if the check passed. The quality results are bitpacked into an integer array to preserve space. For example, if ‘check #0’ uses bit 0 and fails, and ‘check #1’ uses bit 1 and fails then the resulting value on the qc variable would be 2^(0) + 2^(1) = 3. If we had a third check it would be 2^(0) + 2^(1) + 2^(2) = 7.
-
meaning
:str¶ A string that describes the test applied.
-
-
parameters
:RecordQualityResults.Parameters¶
Class Methods
Handles the quality of a variable in the dataset and returns the dataset after
Method Descriptions
-
run
(self, dataset: xarray.Dataset, variable_name: str, failures: numpy.typing.NDArray[numpy.bool8]) → xarray.Dataset¶ Handles the quality of a variable in the dataset and returns the dataset after any corrections have been applied.
- Parameters
dataset (xr.Dataset) – The dataset containing the variable to handle.
variable_name (str) – The name of the variable whose quality should be
handled. –
failures (NDArray[np.bool8]) – The results of the QualityChecker for the
variable (provided) –
True values indicate a quality problem. (where) –
- Returns
The dataset after the QualityHandler has been run.
- Return type
xr.Dataset
-
class
-
class
tsdat.
ReplaceFailedValues
[source]¶ Bases:
tsdat.qc.base.QualityHandler
Replaces all failed values with the variable’s _FillValue. If the variable does not have a _FillValue attribute then nan is used instead
Class Methods
Handles the quality of a variable in the dataset and returns the dataset after
Method Descriptions
-
run
(self, dataset: xarray.Dataset, variable_name: str, failures: numpy.typing.NDArray[numpy.bool8]) → xarray.Dataset¶ Handles the quality of a variable in the dataset and returns the dataset after any corrections have been applied.
- Parameters
dataset (xr.Dataset) – The dataset containing the variable to handle.
variable_name (str) – The name of the variable whose quality should be
handled. –
failures (NDArray[np.bool8]) – The results of the QualityChecker for the
variable (provided) –
True values indicate a quality problem. (where) –
- Returns
The dataset after the QualityHandler has been run.
- Return type
xr.Dataset
-
-
class
tsdat.
Retriever
[source]¶ Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Base class for retrieving data used as input to tsdat pipelines.
- Parameters
readers (Dict[str, DataReader]) – The mapping of readers that should be used to
data given input_keys and optional keyword arguments provided by (retrieve) –
of Retriever. (subclasses) –
-
readers
:Dict[Pattern, Any]¶ Mapping of readers that should be used to read data given input keys.
Class Methods
Prepares the raw dataset mapping for use in downstream pipeline processes by
Method Descriptions
-
abstract
retrieve
(self, input_keys: List[str], dataset_config: tsdat.config.dataset.DatasetConfig, **kwargs: Any) → xarray.Dataset¶ Prepares the raw dataset mapping for use in downstream pipeline processes by consolidating the data into a single xr.Dataset object. The retrieved dataset may contain additional coords and data_vars that are not defined in the output dataset. Input data converters are applied as part of the preparation process.
- Parameters
input_keys (List[str]) – The input keys the registered DataReaders should
from. (read) –
dataset_config (DatasetConfig) – The specification of the output dataset.
- Returns
The retrieved dataset.
- Return type
xr.Dataset
-
class
tsdat.
RetrieverConfig
[source]¶ Bases:
tsdat.config.utils.ParameterizedConfigClass
,tsdat.config.utils.YamlModel
Class used to contain configuration parameters for the tsdat retriever class. This class will ultimately be converted into a tsdat.io.base.Retriever subclass for use in tsdat pipelines.
Provides methods to support yaml parsing and validation, including the generation of json schema for immediate validation. This class also provides a method to instantiate a tsdat.io.base.Retriever subclass from a parsed configuration file.
- Parameters
classname (str) – The dotted module path to the pipeline that the specified
should apply to. To use the built-in IngestPipeline (configurations) –
example (for) –
:param : :param you would set ‘tsdat.pipeline.pipelines.IngestPipeline’ as the classname.: :param readers: The DataReaders to use for reading input :type readers: Dict[str, DataReaderConfig] :param data.:
-
readers
:Dict[Pattern, DataReaderConfig]¶
-
class
tsdat.
SortDatasetByCoordinate
[source]¶ Bases:
tsdat.qc.base.QualityHandler
Sorts the dataset by the failed variable, if there are any failures.
-
class
Parameters
¶ Bases:
pydantic.BaseModel
-
ascending
:bool = True¶ Whether to sort the dataset in ascending order. Defaults to True.
-
-
parameters
:SortDatasetByCoordinate.Parameters¶
Class Methods
Handles the quality of a variable in the dataset and returns the dataset after
Method Descriptions
-
run
(self, dataset: xarray.Dataset, variable_name: str, failures: numpy.typing.NDArray[numpy.bool8]) → xarray.Dataset¶ Handles the quality of a variable in the dataset and returns the dataset after any corrections have been applied.
- Parameters
dataset (xr.Dataset) – The dataset containing the variable to handle.
variable_name (str) – The name of the variable whose quality should be
handled. –
failures (NDArray[np.bool8]) – The results of the QualityChecker for the
variable (provided) –
True values indicate a quality problem. (where) –
- Returns
The dataset after the QualityHandler has been run.
- Return type
xr.Dataset
-
class
-
class
tsdat.
Storage
[source]¶ Bases:
tsdat.utils.ParameterizedClass
,abc.ABC
Abstract base class for the tsdat Storage API. Subclasses of Storage are used in pipelines to persist data and ancillary files (e.g., plots).
- Parameters
parameters (Any) – Configuration parameters for the Storage API. The specific
that are allowed will be defined by subclasses of this base class. (parameters) –
handler (DataHandler) – The DataHandler responsible for handling both read and
operations needed by the storage API. (write) –
-
handler
:DataHandler¶ Defines methods for reading and writing datasets from the storage area.
-
parameters
:Any¶ (Internal) parameters used by the storage API that can be set through configuration files, environment variables, or other means.
Class Methods
Fetches a dataset from the storage area where the dataset’s time span is between
Saves an ancillary file (e.g., a plot, non-dataset metadata file, etc) to the
Saves the dataset to the storage area.
Context manager that can be used to upload many ancillary files at once. This
Method Descriptions
-
abstract
fetch_data
(self, start: datetime.datetime, end: datetime.datetime, datastream: str) → xarray.Dataset¶ Fetches a dataset from the storage area where the dataset’s time span is between the specified start and end times.
- Parameters
start (datetime) – The start time bound.
end (datetime) – The end time bound.
datastream (str) – The name of the datastream to fetch.
- Returns
The fetched dataset.
- Return type
xr.Dataset
-
abstract
save_ancillary_file
(self, filepath: pathlib.Path, datastream: str)¶ Saves an ancillary file (e.g., a plot, non-dataset metadata file, etc) to the storage area for the specified datastream.
- Parameters
filepath (Path) – Where the file that should be saved is currently located.
datastream (str) – The datastream that the ancillary file is associated with.
-
abstract
save_data
(self, dataset: xarray.Dataset)¶ Saves the dataset to the storage area.
- Parameters
dataset (xr.Dataset) – The dataset to save.
-
uploadable_dir
(self, datastream: str) → Generator[pathlib.Path, None, None]¶ Context manager that can be used to upload many ancillary files at once. This method yields the path to a temporary directory whose contents will be saved to the storage area using the save_ancillary_file method upon exiting the context manager.
- Parameters
datastream (str) – The datastream associated with any files written to the
directory. (uploadable) –
- Yields
Generator[Path, None, None] – A temporary directory whose contents should be saved to the storage area.
-
class
tsdat.
StorageConfig
[source]¶ Bases:
tsdat.config.utils.ParameterizedConfigClass
,tsdat.config.utils.YamlModel
Class used to contain configuration parameters for tsdat pipelines. This class will ultimately be converted into a tsdat.pipeline.base.Pipeline subclass for use in tsdat pipelines.
Provides methods to support yaml parsing and validation, including the generation of json schema for immediate validation. This class also provides a method to instantiate a tsdat.io.base.Storage subclass from a parsed configuration file.
- Parameters
classname (str) – The dotted module path to the storage class that the specified
should apply to. To use the built-in FileSystemStorage (configurations) –
for –
example –
would set 'tsdat.io.storage.FileSystemStorage' as the classname. (you) –
handler (DataHandlerConfig) – Config class that should be used for data I/O
the storage area. (within) –
-
handler
:DataHandlerConfig¶
-
class
tsdat.
StringToDatetime
[source]¶ Bases:
tsdat.io.base.DataConverter
Converts date strings into datetime64 data, accounting for the input format and timezone.
- Parameters
format (Optional[str]) – The format of the string data. See strftime.org for more
on what components can be used. If None (information) –
try to interpret the format and convert it automatically. This can be unsafe (will) –
is not explicitly prohibited (but) –
a warning is issued if format is not set (so) –
explicitly. –
timezone (Optional[str]) – The timezone of the input data. If not specified it is
to be UTC. (assumed) –
to_datetime_kwargs (Dict[str, Any]) – A set of keyword arguments passed to the
function as keyword arguments. Note that 'format' is (pandas.to_datetime()) –
included as a keyword argument. Defaults to {}. (already) –
-
format
:Optional[str]¶ %S’ for date strings such as ‘2022-04-13 23:59:00’), or None (the default) to have pandas guess the format automatically.
- Type
The date format the string is using (e.g., ‘%Y-%m-%d %H
- Type
%M
-
timezone
:Optional[str]¶ The timezone of the data to convert. If provided, this converter will apply the appropriate offset to convert data from the specified timezone to UTC. The timezone of the output data is assumed to always be UTC.
-
to_datetime_kwargs
:Dict[str, Any]¶ Any parameters set here will be passed to pd.to_datetime as keyword arguments.
Class Methods
Runs the data converter on the provided (retrieved) dataset.
Method Descriptions
-
convert
(self, dataset: xarray.Dataset, dataset_config: tsdat.config.dataset.DatasetConfig, variable_name: str, **kwargs: Any) → xarray.Dataset¶ Runs the data converter on the provided (retrieved) dataset.
- Parameters
dataset (xr.Dataset) – The dataset to convert.
dataset_config (DatasetConfig) – The dataset configuration.
variable_name (str) – The name of the variable to convert.
- Returns
The converted dataset.
- Return type
xr.Dataset
-
classmethod
warn_if_no_format_set
(cls, format: Optional[str]) → Optional[str]¶
-
class
tsdat.
UnitsConverter
[source]¶ Bases:
tsdat.io.base.DataConverter
Converts the units of a retrieved variable to the units specified by the variable’s specification in the DatasetConfig.
If the ‘input_units’ property is set then that string is used to determine the input input units, otherwise the converter will attempt to look up and use the ‘units’ attribute on the specified variable in the dataset provided to the convert method. If the input units cannot be set then a warning is issued and the original dataset is returned.
- Parameters
input_units (Optional[str]) – The units that the retrieved data comes in.
-
input_units
:Optional[str]¶ The units of the input data.
Class Methods
Runs the data converter on the provided (retrieved) dataset.
Method Descriptions
-
convert
(self, dataset: xarray.Dataset, dataset_config: tsdat.config.dataset.DatasetConfig, variable_name: str, **kwargs: Any) → xarray.Dataset¶ Runs the data converter on the provided (retrieved) dataset.
- Parameters
dataset (xr.Dataset) – The dataset to convert.
dataset_config (DatasetConfig) – The dataset configuration.
variable_name (str) – The name of the variable to convert.
- Returns
The converted dataset.
- Return type
xr.Dataset
-
class
tsdat.
YamlModel
[source]¶ Bases:
pydantic.BaseModel
Class Methods
Method Descriptions
-
classmethod
from_yaml
(cls, filepath: pathlib.Path, overrides: Optional[Dict[str, Any]] = None)¶
-
classmethod
generate_schema
(cls, output_file: pathlib.Path)¶
-
classmethod
-
tsdat.
assert_close
(a: xarray.Dataset, b: xarray.Dataset, check_attrs: bool = True, check_fill_value: bool = True, **kwargs: Any) → None[source]¶ Thin wrapper around xarray.assert_allclose which also checks dataset and variable attrs. Removes global attributes that are allowed to be different, which are currently just the ‘history’ attribute and the ‘code_version’ attribute, and also handles some obscure edge cases for variable attributes.
- Parameters
a (xr.Dataset) – The first dataset to compare.
b (xr.Dataset) – The second dataset to compare.
check_attrs (bool) – Check global and variable attributes in addition to the
Defaults to True. (data.) –
check_fill_value (bool) – Check the _FillValue attribute. This is a special case
xarray moves the _FillValue from a variable's attributes to its (because) –
upon saving the dataset. Defaults to True. (encoding) –
-
tsdat.
assign_data
(dataset: xarray.Dataset, data: numpy.typing.NDArray[Any], variable_name: str) → xarray.Dataset[source]¶ Assigns the data to the specified variable in the dataset.
If the variable exists and it is a data variable, then the DataArray for the specified variable in the dataset will simply have its data replaced with the new numpy array. If the variable exists and it is a coordinate variable, then the data will replace the coordinate data. If the variable does not exist in the dataset then a KeyError will be raised.
- Parameters
dataset (xr.Dataset) – The dataset where the data should be assigned.
data (NDArray[Any]) – The data to assign.
variable_name (str) – The name of the variable in the dataset to assign data to.
- Raises
KeyError – Raises a KeyError if the specified variable is not in the dataset’s
coords or data_vars dictionary. –
- Returns
The dataset with data assigned to it.
- Return type
xr.Dataset
-
tsdat.
decode_cf
(dataset: xarray.Dataset) → xarray.Dataset[source]¶ Decodes the dataset according to CF conventions. This helps ensure that the dataset is formatted and encoded correctly after it has been constructed or modified. This method is a thin wrapper around xarray.decode_cf() which
- Parameters
dataset (xr.Dataset) – The dataset to decode.
- Returns
The decoded dataset.
- Return type
xr.Dataset
-
tsdat.
get_filename
(dataset: xarray.Dataset, extension: str, title: Optional[str] = None) → str[source]¶ Returns a key consisting of the dataset’s datastream, starting date/time, the extension, and an optional title. For file-based storage systems this method may be used to generate the basename of the output data file by providing extension as ‘.nc’, ‘.csv’, or some other file ending type. For ancillary plot files this can be used in the same way by specifying extension as ‘.png’, ‘.jpeg’, etc and by specifying the title, resulting in files named like ‘<datastream>.20220424.165314.plot_title.png’.
- Parameters
dataset (xr.Dataset) – The dataset (used to extract the datastream and starting /
times) (ending) –
extension (str) – The file extension that should be used.
title (Optional[str]) – An optional title that will be placed between the start
and the extension in the generated filename. (time) –
- Returns
The filename constructed from provided parameters.
- Return type
str
-
tsdat.
get_start_date_and_time_str
(dataset: xarray.Dataset) → Tuple[str, str][source]¶ Gets the start date and start time strings from a Dataset. The strings are formatted using strftime and the following formats:
date: “%Y%m%d” time: “”%H%M%S”
- Parameters
dataset (xr.Dataset) – The dataset whose start date and time should be retrieved.
- Returns
The start date and time as strings like “YYYYmmdd”, “HHMMSS”.
- Return type
Tuple[str, str]
-
tsdat.
get_start_time
(dataset: xarray.Dataset) → pandas.Timestamp[source]¶ Gets the earliest ‘time’ value and returns it as a pandas Timestamp, which resembles the built-in python datetime.datetime module.
- Parameters
dataset (xr.Dataset) – The dataset whose start time should be retrieved.
- Returns
The timestamp of the earliest time value in the dataset.
- Return type
pd.Timestamp
-
tsdat.
record_corrections_applied
(dataset: xarray.Dataset, variable_name: str, message: str) → None[source]¶ Records the message on the ‘corrections_applied’ attribute of the specified variable in the dataset.
- Parameters
dataset (xr.Dataset) – The corrected dataset.
variable_name (str) – The name of the variable in the dataset.
message (str) – The
-
tsdat.
recursive_instantiate
(model: Any) → Any[source]¶ Recursively calls model.instantiate() on all ParameterizedConfigClass instances under the the model, resulting in a new model which follows the same general structure as the given model, but possibly containing totally different properties and methods.
Note that this method does a depth-first traversal of the model tree to to instantiate leaf nodes first. Traversing breadth-first would result in new pydantic models attempting to call the __init__ method of child models, which is not valid because the child models are ParameterizedConfigClass instances. Traversing depth-first allows us to first transform child models into the appropriate type using the classname of the ParameterizedConfigClass.
This method is primarily used to instantiate a Pipeline subclass and all of its properties from a yaml pipeline config file, but it can be applied to any other pydantic model.
- Parameters
model (Any) – The object to recursively instantiate.
- Returns
The recursively-instantiated object.
- Return type
Any