Nexadata Pipelines are end-to-end data transformation processes that enable the efficient transformation of datasets through a series of operations. Each pipeline processes record sets, applying transformations step-by-step and passing the output to the next operation. At the start of a pipeline, you input one or more datasets, and by the end, the pipeline generates a fully transformed CSV.
Follow this guide to set up your Nexadata Pipeline:
1. Prepare Your Datasets
A pipeline begins with one or more CSV datasets. These files should be properly formatted to ensure a smooth transformation process. If your pipeline involves multiple datasets, the first operation should be to join them before applying other transformations.
2. Define Transformations
Defining transformations in Nexadata Pipelines can be done in two ways:
Natural Language Mode (⚡ Transform)
In this mode, users can describe the operation they want to perform in simple, natural language. For example, you can type, "Filter out records where the value is less than 100," and the system will automatically configure the transformation based on your input.
Advanced Mode (💡 Advanced)
For more control, users can select and configure transformations through an intuitive UI. Advanced mode allows you to precisely define each step by manually setting up the transformation.
Regardless of the mode you use, Advanced Mode allows you to refine and edit all transformation steps, ensuring maximum flexibility in adjusting your pipeline as needed.
Supported Transformations
Nexadata Pipelines support a range of data transformations, including:
Filter: Select records based on specified criteria.
Aggregate: Summarize data through operations like sum, count, or average.
Join: Combine multiple datasets into one based on a common key.
Sort: Arrange records in ascending or descending order.
Group By: Group records based on a specified column and apply aggregate functions.
Column Rename: Rename columns to align with your schema requirements.
Calculate: Add calculated fields based on existing data.
3. Build Your Pipeline
To build a pipeline:
Start with Your Datasets: Input one or more properly formatted CSV files as the starting point.
Join Multiple Datasets: If more than one dataset is involved, begin by joining them using a common key. This ensures all datasets are properly combined before applying other transformations.
Apply Transformations: Add the necessary transformations in sequence. Each step takes the output from the previous one and passes it along to the next.
Natural Language vs. Advanced Mode
Build your pipeline using natural language:
Or build your pipeline using advanced mode:
As you build your pipeline, visual cues help you identify whether each step was configured using natural language or advanced mode:
Natural Language Mode is denoted by a lightning bolt icon ⚡.
Hover over the lightning bolt to view the precise natural language prompt that was used for the transformation.
Advanced Mode is indicated by a lightbulb icon 💡.
You can seamlessly switch between these modes at any step to edit and refine your pipeline as needed.
4. Pipeline Execution
Once your pipeline is configured, execute it to process the dataset(s). The pipeline will run each transformation step in sequence, ensuring that the output from one step is passed to the next.
5. Final Output
Upon completion, the pipeline generates a final CSV containing the fully transformed dataset. This output file can be downloaded or used as input for other processes, depending on your data workflow needs.
Summary of Steps
Start with one or more properly formatted CSV datasets.
If multiple datasets are used, apply a join operation to combine them.
Define the necessary transformations using Natural Language or Advanced Mode.
Build your pipeline step-by-step, leveraging visual cues to track whether each step was configured using natural language (⚡) or advanced mode (💡).
Execute the pipeline to transform your data.
The final output will be a transformed CSV ready for use.