Once you've successfully connected your Amazon S3 bucket to Nexadata, you can begin using the connection to bring data into Nexadata (via Datasets) or send data back to S3 (via Pipeline Outputs). This article will guide you through both processes.
Here are the key capabilities:
Import data from S3 by selecting your bucket and specifying a file path.
Export transformed data to S3 using either:
Dynamic output (appends a timestamp to each file), or
Static output (overwrites the same file).
Query available S3 buckets and browse files within your connection.
Customize output paths and formats as needed for your workflows.
Letโs walk through each setup process step-by-step.
Part 1: Creating a Dataset from S3
Use this process to ingest data stored in S3 into Nexadata for use in Pipelines.
Step 1: Create a New Dataset
Navigate to the Datasets section in Nexadata.
Click Create Dataset.
Under Data Connection, select your configured S3 connection (e.g.,
Nexadata Demo S3
).
Step 2: Configure Dataset Properties
Name: Provide a unique name for your dataset.
Data Connection: Ensure your Amazon S3 connection is selected.
Step 3: Select Data Format
Choose the format of your file stored in S3:
Tabular (CSV or TXT)
JSON (coming soon)
Parquet (coming soon)
Step 4: Specify S3 Bucket and File Path
Start typing to search available buckets (e.g.,
nexadata-demo
).In S3 Path, provide the full file name (e.g.,
SIMPLE_FACT.txt
).
Step 5: Set Delimiter (for Tabular)
Choose your delimiter:
Comma (,)
Tab
Semicolon (;)
Pipe (|)
Step 6: Submit the Dataset
Review the setup and click Submit.
The dataset is now ready to use in any Nexadata pipeline.
Part 2: Configuring an S3 Output in a Pipeline
Once your Nexadata pipeline is complete, use S3 as an output destination to store your transformed results.
Step 1: Add an Output to the Pipeline
Navigate to the Customized Outputs section under the Pipeline Setup tab.
Click Add Output and select your configured Amazon S3 connection.
Step 2: Choose Output File Type
Select how the file will be written to S3:
Dynamic (recommended): Adds a timestamp to each file name.
Static: Replaces the file at the specified S3 path on each run.
๐ Dynamic is ideal for versioned outputs. Static is best for overwriting the same reference file.
Step 3: Provide S3 Details
S3 Bucket: Enter your S3 bucket name (e.g.,
nexadata-demo
).S3 Path: Provide the full path with filename (e.g.,
exports/report.csv
).
Step 4: Save the Output
Click Save to confirm and register the output configuration with your pipeline.
Tips and Best Practices
Dynamic Outputs help preserve historical pipeline results by creating timestamped files.
Use Static Outputs only when you're sure overwriting is acceptable.
Always ensure proper IAM permissions to list and write to S3 buckets.
Organize S3 paths by project or date for easier tracking.
Double-check file formats and delimiters when importing tabular data.