Pipeline ConfigurationΒΆ
The pipeline config file pipeline.yaml
describes the configuration of your pipeline:
Triggers - which file input file patterns should trigger this pipeline
Ingest Class - class name of the ingest pipeline to use
Dependent Config Files - which yaml files to use for the retriever, dataset, quality management, and storage
Each pipeline template will include a starter pipeline config file in the config folder. It will work out of the box, but the configuration should be tweaked according to the specifics of your pipeline. Consult the Getting Started section for more information on getting started with a template.
Note
To prevent redundancy, Tsdat config files are designed to be shared across multiple pipelines. In the pipeline config file, you can specify a shared config file to use (ie., shared/config/dataset.yaml) and then override specific values in the overrides section.
An annotated example of an ingest pipeline config file is provided below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | # Name of the Ingest Pipeline to use classname: tsdat.pipeline.ingest.IngestPipeline # Regex patterns that should trigger this pipeline triggers: - .*example_ingest.*\.csv # Retriever config retriever: path: pipelines/example_ingest/config/retriever.yaml # Dataset config. In this example, we use a dataset.yaml file that is shared across multiple pipelines, # but we override one global attribute specifying a different location and we add one additional variable attribute. dataset: path: shared/config/dataset.yaml overrides: /attrs/location_id: sgp /data_vars/first/attrs/new_attribute: please add this attribute # Quality config - shared across multiple pipelines quality: path: shared/config/default-quality.yaml # Storage config - shared across multiple pipelines storage: path: shared/config/storage.yaml |