Skip to main content

Workflow Concepts

Understand how Datasets, Pipelines, and Output Columns work together in Nexadata Workflows to create unified, analysis-ready data streams.

R
Written by Ryan Curtin
Updated over 2 weeks ago

Nexadata Workflows are designed to bring together structured, semi-structured, and unstructured data sources—such as application data, spreadsheets, and PDF reports—into a single, harmonized data stream.

At the heart of this process are three core components: Datasets, Pipelines, and Output Columns—each playing a vital role in how a Workflow functions.

Core Components of a Workflow

  • Datasets: The raw inputs, which may come from applications such as Salesforce or HubSpot, CSV files, spreadsheets, or PDFs

  • Pipelines: Logic layers that map, clean, and transform the datasets

  • Output Pipeline: The final transformation stage that generates the Workflow’s output

  • Primary Output Pipeline: Each Workflow has one designated “Primary” Output Pipeline that establishes the Output Columns

  • Output Columns: The set of columns defined by the Primary Output Pipeline, representing the structure of the final output

Key Concept: All supporting datasets and transformation Pipelines feed into a single, unified output structure—defined by the Output Columns of the Primary Output Pipeline. This creates a consistent format for downstream use.

The Output Columns summary is always visible in the Workflow Setup tab to ensure alignment and transparency as you build and evolve your Workflow. 👍


Use Case Examples

1. Collating PDF Invoices Across Vendors

Scenario: A company receives monthly invoices from various professional services firms in PDF format. Each vendor uses a different layout, and the data within each invoice may require extraction, normalization, and even derivation of calculated fields.

Workflow Solution:


Supporting Pipelines are configured to extract and normalize data from each vendor’s invoice. Since invoice formats vary, certain values may need to be derived—such as calculating activity-level billing by taking the total billed amount, dividing it by total hours, and applying the resulting rate to the time billed for each activity.

Once each Supporting Pipeline is generating accurate and consistent datasets, the outputs are passed into one or more Output Pipelines. These pipelines:

  • Automatically stack and unify the data from all vendors

  • Apply consistent mapping rules to align billing activities to standard reporting categories

  • Transform all records into a unified set of Output Columns such as Invoice Date, Vendor, Line Item, Amount, and Cost Center

The result is a harmonized dataset that supports consistent reporting and enables automated allocation of costs to departments.

This approach eliminates manual data entry and fragmented spreadsheet workflows, replacing them with a scalable, repeatable, and auditable process.


2. Harmonizing CRM Data from Multiple Salesforce Instances

Scenario: An enterprise uses separate Salesforce instances across regions or business units. Each instance stores similar customer or sales data but with variations in field naming, formatting, or structure.

Workflow Solution:

  • Each Salesforce instance is configured as a Dataset with its own transformation Pipeline.

  • A single Output Pipeline consolidates these varied datasets, mapping fields like Customer Name, Opportunity Stage, and Close Date into a standardized format.

  • The Primary Output Pipeline defines the Output Columns, ensuring a clean, unified view of sales performance across the organization.

This enables company-wide reporting and forecasting without requiring changes to the original Salesforce instances.


3. Integrating Marketing and Product Usage Data for Customer Insights

Scenario: A SaaS company wants to combine campaign engagement data from HubSpot with in-product usage metrics from their application logs to better understand customer behavior.

Workflow Solution:

  • Marketing data from HubSpot (structured data) and product usage data from logs are ingested as separate Datasets.

  • Supporting Pipelines normalize and map both data sources.

  • A single Output Pipeline merges these insights into one Output Stream, aligning fields such as User ID, Campaign Name, Feature Used, and Timestamp.

  • The result is a powerful dataset for customer segmentation, churn prediction, and campaign optimization.

Did this answer your question?