Skip to main content

Supported Data Formats in Nexadata

Overview of Nexadata's four supported Data Formats: Tabular (CSV/TSV), Spreadsheet (XLSX), JSON, and Parquet, with examples for each.

Nexadata supports four Data Formats for file-based Datasets: Tabular, Spreadsheet, JSON, and Parquet. You choose the format on the Connect Data step when creating or editing a Dataset. This article explains each format, the file types it covers, and the conventions Nexadata expects so your data ingests cleanly. All four formats work with Nexadata's file-based connections, including Nexadata Hub, Amazon S3, and SFTP.

Tabular

Tabular data refers to delimited text files that organize records into rows and columns using a consistent character to separate fields. The most common tabular format is CSV (Comma-Separated Values), but Nexadata also supports tab-delimited and semicolon-delimited files under the same Tabular setting.

A proper tabular file includes:

  1. Consistent Delimiters: Each field in a row is separated by the same delimiter (comma, tab, or semicolon in Nexadata).

  2. Quoted Strings: If a field contains the delimiter character itself, that field should be enclosed in double quotes.

  3. No Extraneous Characters: Proper tabular files do not include stray characters or line breaks within records unless handled correctly with quotes.

Supported Tabular Delimiters

Nexadata supports the following delimiters:

  • Comma ( , )

  • Tab

  • Semicolon ( ; )

Note: Additional delimiter options are planned for future updates.

Example of Proper Tabular Format

With a comma delimiter:

Name,Age,City 
"John Doe",29,"New York"
"Jane Smith",34,"Los Angeles"

Spreadsheet

Spreadsheet format refers to Excel-style workbook files (.xlsx, .xls, .xlsm). Unlike a flat tabular file, a workbook can contain multiple sheets, formatting, formulas, banner rows, title blocks, and other non-tabular content. Nexadata's Spreadsheet ingestion is built to handle that complexity by letting you specify which sheet to read, where the data region begins on that sheet, and whether to auto-detect or hard-code the size of the data block.

Common use cases include:

  • Monthly or quarterly reports exported from BI tools, ERPs, or planning platforms that include a logo and title rows above the data

  • Multi-sheet workbooks where only one tab contains the data you want to load

  • Files with subheaders, merged cells, or notes between the header row and the first data row

For step-by-step instructions on configuring the Spreadsheet-specific settings (Sheet Name, Anchor Cell, Header Offset, Dynamic Range, and fixed Rows and Columns), see Setting Up a Spreadsheet Dataset.

Example Spreadsheet Layout

A typical workbook sheet looks like this when opened in Excel:

A

B

C

1

(company logo)

2

Monthly Sales Report

3

November 2024

4

5

Name

Age

City

6

John Doe

29

New York

7

Jane Smith

34

Los Angeles

In this example, you would set the Anchor Cell to A5 so Nexadata begins reading from the header row, ignoring the logo and title rows above.


JSON (coming soon)

JSON (JavaScript Object Notation) is a flexible, lightweight format for representing structured data. JSON organizes data into key-value pairs and supports nested structures, making it well suited for data sourced from APIs and modern applications.

Benefits of JSON:

  • Human-readable and easy to interpret

  • Supports nested structures for complex data

  • Native format for most modern web APIs and SaaS exports

Example JSON Format

[
{
"Name": "John Doe",
"Age": 29,
"City": "New York"
},
{
"Name": "Jane Smith",
"Age": 34,
"City": "Los Angeles"
}
]

Parquet (coming soon)

Parquet is a columnar storage format optimized for large-scale data processing. It is widely used in data lakes and big data platforms because of its efficient storage and strong handling of complex data types.

Benefits of Parquet:

  • Highly compressed, leading to lower storage costs

  • Optimized for analytical workloads and large datasets

  • Schema is embedded in the file, ensuring type consistency between source and target

Example Parquet Format

Parquet is a binary format and is not typically human-readable. Its logical structure looks like this:

Name

Age

City

John Doe

29

New York

Jane Smith

34

Los Angeles


Allow Any File Type

By default, Nexadata uses the file extension (such as .csv, .xlsx, .json, .parquet) to validate that a file matches the selected Data Format. The Allow Any File Type toggle on the Connect Data step lets you bypass this check so you can load files that have no extension or an unexpected one. Turn this on only when your source system produces files without recognizable extensions, since it disables the safety check that prevents format mismatches.

Did this answer your question?