When working with Nexadata, you can create new datasets to manage your data pipelines effectively. This guide walks you through the process of adding a new dataset, including defining the source's name, format, and connection options.
Step-by-step Instructions
Step 1: Open the "Create New Dataset" Form
Navigate to the Nexadata dashboard and click on Add New Dataset.
This will bring up the dataset creation form.
Step 2: Enter a Name for Your Dataset
Provide a name in the Name field. While not required, it is recommended that this name be unique.
Step 3: Select Data Connection
From the Data Connection dropdown, choose the appropriate connection. You may see options like "sample data" or your organization’s available connections.
Step 4: Choose Data Format
Currently, Nexadata supports the following data formats:
Delimited data support includes CSV (Comma-Separated Values), TSV (Tab-Separated Values), and Semicolon.
Support for JSON and Parquet formats is coming soon, so they cannot be used at this time.
Step 5: Specify details based on the Data Connection
Data Connection | Connection Details |
Nexadata File Manager | In the File dropdown, begin typing the file name and select it from the suggestions. |
Amazon S3 | 1) In the S3 Bucket dropdown, start typing the name of the bucket where your data file is stored and select the appropriate bucket from the list
2) In the S3 Path field, enter the full path to your data file within the selected S3 bucket. |
Step 6: Submit the Form
Once all fields are filled, click the Submit button at the bottom of the form.
Nexadata will now register your new dataset.
Coming Soon: JSON and Parquet Support
Nexadata will soon support JSON and Parquet data formats. Once available, you can choose these options in the Data Format section.
Troubleshooting
If the S3 Bucket or S3 Path fields are not populating, ensure that your AWS permissions are correctly configured.
Double-check your data file for any formatting issues (e.g., using the wrong delimiter for CSV files).