The Join Transformation in Nexadata Pipelines allows you to combine two or more datasets by matching a common key, making it possible to create a unified dataset with information from multiple sources. This capability provides a graphical interface to help you define the join type, select the relevant columns, and visualize the output dataset structure. The Join Transformation supports inner, outer, left, and right joins, all accessible through a user-friendly graphical approach that requires no advanced technical skills.
Inputs for the Join Transformation
Name of the Transformation: The Join Transformation is automatically named in Natural Language Mode but can be customized in Advanced Mode. For instance, if you're merging customer and sales data, you might rename it to something like "Customer Sales Data Join."
Join Type: Nexadata’s Join Transformation offers several join types to fit different data merging needs. These join types determine which records from each dataset appear in the final output.
Inner Join: Includes only records that have matching values in both datasets.
Example: Merging sales and customer datasets, only showing records for customers with completed sales.
Outer Join: Combines all records from both datasets, filling with null where there are no matches.
Example: Showing all customers, including those without sales, and all sales records, even if there’s no customer match.
Left Join: Includes all records from the left dataset and matching records from the right dataset.
Example: Including all customers, with corresponding sales data where available.
Right Join: Includes all records from the right dataset and matching records from the left dataset.
Example: Showing all sales records and including customer data if there’s a matching customer ID.
Datasets to Join: Choose the datasets that you want to combine. In Natural Language Mode, you can simply specify the names of the datasets to join. For example, "Join Customer Dataset with Sales Dataset."
Join Key: Select the common key (column) that both datasets share. This key is used to match records, ensuring that the data aligns correctly. For instance, you might use “Customer ID” as the join key between customer and sales datasets.
Columns to Include in the Output Dataset: Choose the specific columns from each dataset that you want in the output. This helps create a streamlined, customized dataset.
Example: From the customer dataset, you might include "Customer Name" and "Customer ID," and from the sales dataset, "Order ID" and "Order Total."
Using Natural Language Mode
In Natural Language Mode, you can set up a join using simple instructions, and Nexadata will automatically configure the transformation. The name of the transformation is also generated based on your instruction but can be customized later if needed.
Example Instructions in Natural Language Mode
Join the Customer Dataset and Sales Dataset using Customer ID.
Combine Orders and Product Datasets, using Product ID as the key.
Merge Employee and Project datasets on Employee ID.
Note: If Natural Language Mode doesn’t capture the specifics of your join needs, switch to Advanced Mode to manually configure the transformation.
Using Advanced Mode
Advanced Mode allows detailed customization of the Join Transformation, including manual specification of join types, selected columns, and column names.
Steps in Advanced Mode
Name of the Transformation: Give your transformation a custom name, like "Customer Sales Join."
Select Datasets to Join: Identify which datasets you want to join, such as "Customer" and "Sales."
Choose the Join Key: Specify the column used to connect both datasets (e.g., "Customer ID").
Select a Join Type: Choose from Inner, Outer, Left, or Right joins, depending on your dataset needs.
Define Output Columns: Choose the specific columns from each dataset to include in the output.
Example in Advanced Mode
Name of the Transformation: Customer Sales Data Join
Datasets to Join: Customer and Sales
Join Key: Customer ID
Join Type: Inner Join
Selected Columns:
From Customer dataset: Customer Name, Customer ID
From Sales dataset: Order ID, Order Total
This configuration will combine data from both datasets where the Customer ID matches, creating a unified dataset with customer and sales information for matched records.
Example Use Cases
Customer Sales Analysis: Use an Inner Join to merge customer and sales datasets, allowing you to analyze purchase data for each customer.
Inventory and Supplier Matching: Combine inventory and supplier datasets using Product ID as a key, using a Left Join to include all inventory items and associated suppliers.
Employee Project Assignments: Use a Right Join on Employee ID to create a dataset listing all projects, along with assigned employees.
Summary
The Join Transformation in Nexadata Pipelines simplifies the process of merging datasets with a graphical interface and flexible join options. Whether using Natural Language Mode for quick setup or Advanced Mode for detailed control, the Join Transformation enables users to quickly create custom datasets, fostering deeper insights and more accessible data analysis.