Transformation / VAP Pipelines¶
Transformation pipelines, also referred to as Value-Added Product (VAP) pipelines, are tsdat pipelines that use data from several standardized input sources and combine them in ways that add value to the data.
Warning
Tsdat support for transformation pipelines is currently in an alpha phase, meaning that new features are being actively developed and APIs involved may be relatively unstable as new use cases are added and requirements are discovered. We greatly appreciate any feedback on this new capability.
Tsdat transformation pipelines are configured in almost exactly the same way as the ingestion pipelines you may already be used to. In fact, the tsdat TransformationPipeline class inherits all of its methods and attributes from the IngestPipeline class and only overrides the retriever code to ensure that input data are retrieved from the storage area.
Only the pipeline.yaml and retriever.yaml configuration files have any differences from their counterparts for a tsdat ingest. These are shown below.
Pipeline Configuration File¶
The pipeline configuration file for transformation pipelines is almost identical to its ingest pipeline counterpart. There are only two differences:
The classname should point to tsdat.TransformationPipeline, or a class derived from it.
The trigger should be empty since transformation pipelines are currently run manually.
An example transformation pipeline pipeline.yaml
file is shown below:
classname: tsdat.TransformationPipeline
triggers: {}
retriever:
path: pipelines/example_pipeline/config/retriever.yaml
dataset:
path: shared/config/dataset.yaml
quality:
path: shared/config/default-quality.yaml
storage:
path: shared/config/storage.yaml
Retriever Configuration File¶
The retriever configuration file for transformation pipelines is also similar to its ingest pipeline counterpart, but there are some notable differences, mostly pertaining to how data from various input sources should be combined. These are noted below:
The classname should point to tsdat.StorageRetriever, or a class derived from it.
If a coord (e.g., “time”) does not have any shape-modifying data_converters, then its shape remains unchanged
If a data_var does not have any shape-modifying converters then its shape must already match the shape of any coordinates that dimension it, or the pipeline will crash.
The NearestNeighbor data converter was added to map data variables onto the correct coordinate grid.
retriever.yaml
:
classname: tsdat.StorageRetriever
coords:
time:
.*buoy_z06\.a1.*:
name: time
data_converters: []
data_vars:
temperature:
.*buoy_z07\.a1.*:
name: temp
data_converters:
- classname: tsdat.io.converters.UnitsConverter
input_units: degF
- classname: tsdat.io.converters.NearestNeighbor
coord: time
humidity:
.*buoy_z07\.a1.*:
name: rh
data_converters:
- classname: tsdat.io.converters.NearestNeighbor
coord: time