The Filter transformation in Nexadata Pipelines allows you to refine your dataset by including or excluding rows based on specific conditions. This gives you precise control over which records are included in your final output. The transformation supports complex filtering logic using AND and OR operations, along with condition grouping, enabling you to meet a wide range of filtering needs. You can configure this transformation using either Natural Language Mode or Advanced Mode.
Inputs for the Filter Transformation
Name of the Transformation: In Natural Language Mode, the transformation name is automatically generated, but it can be updated later using Advanced Mode. For example, you might rename it to something more descriptive, like "Active Employee Filter".
Select Column to Filter: Choose the column from your dataset to which you want to apply the filter condition.
Choose a Comparison Operator: Select how you want to compare the values in the column.
For text columns (strings), the available operators are:Equal: Filters for values that exactly match the given value.
Not Equal: Filters for values that do not match the given value.
Starts With: Filters for values that begin with the specified string.
Ends With: Filters for values that end with the specified string.
Empty: Filters for rows where the column is empty.
Not Empty: Filters for rows where the column is not empty.
Regular Expression Matches: Filters based on a regular expression match.
String Matches (Like): Filters for values that match a pattern.
For numeric columns, the available operators are:
Equal: Filters for values that exactly match the given value.
Not Equal: Filters for values that do not match the given value.
Equality Evaluators: Filters for values that match based on being greater than (>), greater than or equal (≥), less than (<), and less than or equal (≤)
Define the Comparison Support: The filter can compare the selected column against:
Value: A static value you manually input (e.g., a number, string, or date).
Column: Another column in the dataset. The comparison evaluates how the selected column relates to values in the comparison column.
Percentage: A percentage value relative to the data (useful for comparing numbers or calculating thresholds).
Input the Comparison Value: Enter the value, column, or percentage to define what the selected column will be filtered by.
Using Natural Language Mode
In Natural Language Mode, you can apply filters using a single instruction. Nexadata will automatically generate the filter transformation based on your natural language input. The transformation will also be automatically named, but you can update it later in Advanced Mode.
Example Instructions in Natural Language Mode:
Filter out rows where Employee Status is not Full-time.
Filter out rows where Employee Status is equal to Inactive.
Filter the dataset to include only rows where the Department column equals Sales and the Status column is Active.
Filter out rows where Department is equal to HR.
Filter out rows where Employee Tenure is less than 2 years.
Note: If Natural Language Mode doesn’t perfectly interpret your instructions, you can switch to Advanced Mode to fine-tune the transformation.
Using Advanced Mode
In Advanced Mode, you have full control over how the filter is set up. This mode allows you to define multiple conditions and group conditions and specify the logic operators to be used (AND/OR).
Steps in Advanced Mode:
Name the Transformation: Enter or update a custom name, such as "Filter Active Employees".
Select Column to Filter: Choose the column in your dataset that you want to filter.
Choose a Comparison Operator: Pick an operator like Equal, Not Equal, or Starts With to define how the values should be compared.
Define the Comparison Support: Choose Value, Column, or Percentage for the comparison.
Input the Comparison Value: Provide the specific value, column, or percentage to filter against.
Group and Manage Conditions: Add additional conditions as needed, combining them with AND/OR logic. Group conditions to build more complex filters.
Example in Advanced Mode:
Name of the Transformation: Filter Active Employees
Select Column to Filter: Employee_Status
Comparison Operator: Equal
Comparison Support: Value
Comparison Value: Active
Logic: AND
Advanced Mode enables you to set up detailed, multi-condition filters, ensuring greater flexibility in filtering your data.
Managing Multiple Conditions
You can add multiple conditions to refine your dataset further. Each condition can be connected using AND or OR logic:
AND: All conditions must be true for the record to be included.
OR: At least one condition must be true for the record to be included.
Grouping Conditions:
Conditions can be grouped to create advanced logic. For instance, you might want to filter rows where "Sales" > 1000 AND "Region" = "North", or where "Category" = "Electronics" OR "Category" = "Furniture".
Example Use Case
Imagine you have a dataset with columns for "Region", "Sales", and "Product Category", and you want to filter the rows based on the following:
The "Sales" value is greater than 1000.
"Product Category" starts with the letter "E".
The "Region" is either "North" or "West".
In this case, you would set up the conditions using AND and OR logic to create a highly refined dataset for analysis.
Summary
The Filter transformation in Nexadata Pipelines is a powerful tool that lets you refine your dataset by applying conditions, enabling precise control over which records are included in your final output. Whether you use Natural Language Mode for quick setup or Advanced Mode for more complex filtering, this transformation's flexibility ensures you can meet a wide range of data filtering needs.