Nexadata currently supports tabular data in CSV format and plans to support JSON and Parquet formats soon. This article explains each of these formats, details Nexadata's current capabilities with CSV data, including delimiter options, and provides format samples.
Supported Format: Tabular (CSV)
CSV (Comma-Separated Values) is a tabular format that organizes data into rows and columns, using a consistent delimiter to separate fields within each row. Nexadata specifically supports proper CSV files, meaning files that adhere to certain standard formatting rules to ensure compatibility and data integrity. A proper CSV file includes:
Consistent Delimiters: Each field in a row is separated by a consistent delimiter (comma, tab, or semicolon in Nexadata).
Quoted Strings: If a field contains the delimiter character itself, that field should be enclosed in double-quotes.
No Extraneous Characters: Proper CSV files do not include extraneous characters or line breaks within records unless handled correctly with quotes.
Nexadata's CSV Delimiter Options
Nexadata supports the following delimiters in CSV files:
Comma ( , )
Tab
Semicolon ( ; )
Note: Additional delimiter options are planned for future updates.
Example of Proper CSV Format
With a comma delimiter:
Name,Age,City
"John Doe",29,"New York"
"Jane Smith",34,"Los Angeles"
Coming Soon: JSON
JSON (JavaScript Object Notation) is a flexible, lightweight format for representing structured data. JSON data is organized into key-value pairs and can handle nested structures, making it ideal for complex data configurations.
Benefits of JSON:
Human-readable and easy to interpret
Supports nested structures for complex data
Example JSON Format
[
{
"Name": "John Doe",
"Age": 29,
"City": "New York"
},
{
"Name": "Jane Smith",
"Age": 34,
"City": "Los Angeles"
}
]
JSON format support in Nexadata is currently under development and will be available soon.
Coming Soon: Parquet
Parquet is a columnar storage format optimized for large-scale data processing. It is widely used in data lakes and big data platforms due to its efficient storage, particularly for complex data types.
Benefits of Parquet:
Highly compressed, leading to lower storage costs
Optimized for analytical workloads and large datasets
Example Parquet Format
While Parquet is a binary format and not typically human-readable, its structure can be described as follows:
Name | Age | City |
John Doe | 29 | New York |
Jane Smith | 34 | Los Angeles |
Parquet format support in Nexadata is also in progress and will be available in a future release.