Nexadata Workflows are designed to bring together structured, semi-structured, and unstructured data sources—such as application data, spreadsheets, and PDF reports—into a single, harmonized data stream.
At the heart of this process are three core components: Datasets, Pipelines, and Output Columns—each playing a vital role in how a Workflow functions.
Core Components of a Workflow
Datasets: The raw inputs, which may come from applications such as Salesforce or HubSpot, CSV files, spreadsheets, or PDFs
Pipelines: Logic layers that map, clean, and transform the datasets
Output Pipeline: The final transformation stage that generates the Workflow’s output
Primary Output Pipeline: Each Workflow has one designated “Primary” Output Pipeline that establishes the Output Columns
Output Columns: The set of columns defined by the Primary Output Pipeline, representing the structure of the final output
Key Concept: All supporting datasets and transformation Pipelines feed into a single, unified output structure—defined by the Output Columns of the Primary Output Pipeline. This creates a consistent format for downstream use.
The Output Columns summary is always visible in the Workflow Setup tab to ensure alignment and transparency as you build and evolve your Workflow. 👍
Use Case Examples
1. Collating PDF Invoices Across Vendors
Scenario: A company receives monthly invoices from various professional services firms in PDF format. Each vendor uses a different layout, and the data within each invoice may require extraction, normalization, and even derivation of calculated fields.
Workflow Solution:
Supporting Pipelines are configured to extract and normalize data from each vendor’s invoice. Since invoice formats vary, certain values may need to be derived—such as calculating activity-level billing by taking the total billed amount, dividing it by total hours, and applying the resulting rate to the time billed for each activity.
Once each Supporting Pipeline is generating accurate and consistent datasets, the outputs are passed into one or more Output Pipelines. These pipelines:
Automatically stack and unify the data from all vendors
Apply consistent mapping rules to align billing activities to standard reporting categories
Transform all records into a unified set of Output Columns such as
Invoice Date
,Vendor
,Line Item
,Amount
, andCost Center
The result is a harmonized dataset that supports consistent reporting and enables automated allocation of costs to departments.
This approach eliminates manual data entry and fragmented spreadsheet workflows, replacing them with a scalable, repeatable, and auditable process.
2. Harmonizing CRM Data from Multiple Salesforce Instances
Scenario: An enterprise uses separate Salesforce instances across regions or business units. Each instance stores similar customer or sales data but with variations in field naming, formatting, or structure.
Workflow Solution:
Each Salesforce instance is configured as a Dataset with its own transformation Pipeline.
A single Output Pipeline consolidates these varied datasets, mapping fields like
Customer Name
,Opportunity Stage
, andClose Date
into a standardized format.The Primary Output Pipeline defines the Output Columns, ensuring a clean, unified view of sales performance across the organization.
This enables company-wide reporting and forecasting without requiring changes to the original Salesforce instances.
3. Integrating Marketing and Product Usage Data for Customer Insights
Scenario: A SaaS company wants to combine campaign engagement data from HubSpot with in-product usage metrics from their application logs to better understand customer behavior.
Workflow Solution:
Marketing data from HubSpot (structured data) and product usage data from logs are ingested as separate Datasets.
Supporting Pipelines normalize and map both data sources.
A single Output Pipeline merges these insights into one Output Stream, aligning fields such as
User ID
,Campaign Name
,Feature Used
, andTimestamp
.The result is a powerful dataset for customer segmentation, churn prediction, and campaign optimization.