Workflow Pipelines
Once one or more Datasets have been added to a Workflow, the next step is to create Pipelines to process, transform, and prepare the data for output. Pipelines are where your business logic, mapping, and data harmonization take place.
In the Pipelines tab, you can add two types of pipelines: Supporting Pipelines and Output Pipelines.
In this view, Pipelines are grouped into two sections:
Supporting Pipelines – Shown at the top, with name, last edit, and timestamp
Output Pipelines – Listed below, including indicators for column match status and which one is marked as Primary (denoted by a shield icon)
Use the “New Pipeline” button to create and configure additional pipelines as needed.
Supporting Pipelines
Supporting Pipelines are executed first in the Workflow and are ideal for preprocessing, enrichment, or intermediate transformations. Key features include:
Reusability: Their outputs can be referenced by other Supporting Pipelines or Output Pipelines.
Independent Schema: They do not need to match the final Workflow Output Columns.
Flexible Order: You can change their execution order, allowing one Supporting Pipeline to rely on the output of another.
Joins and Limitations: While joining the output of two Supporting Pipelines directly is not supported, you can achieve similar results by joining source datasets within a single pipeline using the Join transformation.
Use Supporting Pipelines to organize complex transformations into manageable stages, especially when dealing with multiple input formats or enrichment layers.
Output Pipelines
Output Pipelines produce the final output of the Workflow. They must match the structure of the Workflow’s Output Columns, which are defined by the Output Pipeline marked as Primary.
Key Features of Output Pipelines:
Schema Conformance: All Output Pipelines must match the Output Columns defined by the Primary Output Pipeline.
Stacking: If multiple Output Pipelines are used, their outputs are automatically stacked into one combined dataset.
Primary Designation: One Output Pipeline must be set as Primary. This pipeline defines the Workflow’s Output Columns. You can switch the Primary designation at any time, and the new Primary will redefine the expected structure for all Output Pipelines.
Column Match Status: Pipelines that don’t match the Primary’s schema are flagged in the Columns Match column.
Prescriptive Feedback: When editing a non-conforming Output Pipeline, Nexadata provides helpful messages showing exactly what needs to be updated for the pipeline to match.