Skip to main content

Using an Amazon S3 Connection

Learn how to use the Amazon S3 connection to create a Nexadata Dataset and write transformed results back to S3.

Lourens Kok avatar
Written by Lourens Kok
Updated over a week ago

Once you've successfully connected your Amazon S3 bucket to Nexadata, you can begin using the connection to bring data into Nexadata (via Datasets) or send data back to S3 (via Pipeline Outputs). This article will guide you through both processes.

Here are the key capabilities:

  • Import data from S3 by selecting your bucket and specifying a file path.

  • Export transformed data to S3 using either:

    • Dynamic output (appends a timestamp to each file), or

    • Static output (overwrites the same file).

  • Query available S3 buckets and browse files within your connection.

  • Customize output paths and formats as needed for your workflows.

Letโ€™s walk through each setup process step-by-step.


Part 1: Creating a Dataset from S3

Use this process to ingest data stored in S3 into Nexadata for use in Pipelines.

Step 1: Create a New Dataset

  1. Navigate to the Datasets section in Nexadata.

  2. Click Create Dataset.

  3. Under Data Connection, select your configured S3 connection (e.g., Nexadata Demo S3).

Step 2: Configure Dataset Properties

  • Name: Provide a unique name for your dataset.

  • Data Connection: Ensure your Amazon S3 connection is selected.

Step 3: Select Data Format

Choose the format of your file stored in S3:

  • Tabular (CSV or TXT)

  • JSON (coming soon)

  • Parquet (coming soon)

Step 4: Specify S3 Bucket and File Path

  • Start typing to search available buckets (e.g., nexadata-demo).

  • In S3 Path, provide the full file name (e.g., SIMPLE_FACT.txt).

Step 5: Set Delimiter (for Tabular)

Choose your delimiter:

  • Comma (,)

  • Tab

  • Semicolon (;)

  • Pipe (|)

Step 6: Submit the Dataset

  • Review the setup and click Submit.

  • The dataset is now ready to use in any Nexadata pipeline.


Part 2: Configuring an S3 Output in a Pipeline

Once your Nexadata pipeline is complete, use S3 as an output destination to store your transformed results.

Step 1: Add an Output to the Pipeline

  1. Navigate to the Customized Outputs section under the Pipeline Setup tab.

  2. Click Add Output and select your configured Amazon S3 connection.

Step 2: Choose Output File Type

Select how the file will be written to S3:

  • Dynamic (recommended): Adds a timestamp to each file name.

  • Static: Replaces the file at the specified S3 path on each run.

๐Ÿ“Œ Dynamic is ideal for versioned outputs. Static is best for overwriting the same reference file.

Step 3: Provide S3 Details

  • S3 Bucket: Enter your S3 bucket name (e.g., nexadata-demo).

  • S3 Path: Provide the full path with filename (e.g., exports/report.csv).

Step 4: Save the Output

Click Save to confirm and register the output configuration with your pipeline.


Tips and Best Practices

  • Dynamic Outputs help preserve historical pipeline results by creating timestamped files.

  • Use Static Outputs only when you're sure overwriting is acceptable.

  • Always ensure proper IAM permissions to list and write to S3 buckets.

  • Organize S3 paths by project or date for easier tracking.

  • Double-check file formats and delimiters when importing tabular data.

Did this answer your question?