Getting Started with Unstructured Data in Nexadata

Nexadata's Unstructured Data capability allows you to extract structured tables from PDF files, including tables that span multiple pages, and transform them into reusable dataset templates. Once a template is configured, it can be used to consistently process invoices, reports, presentations, and other unstructured documents at scale.

Step 1: Navigate to Unstructured Data

From the top navigation, click Setup, then select Unstructured Data from the dropdown menu. This opens the Unstructured Data library, where you can manage existing import workflows or create a new one.

Step 2: Upload Your PDF File

Click Import Unstructured Data in the top right, then select Create New Import Workflow. You will be prompted to provide a Display Name for the workflow and select a file to upload. Click Choose File, select your PDF, then click Submit Upload to begin processing.

Uploading a PDF file and submitting the import workflow

👉 Supported file formats include PDF, PNG, JPEG, and TIFF.

Step 3: Select and Configure Tables

Once the file is processed, Nexadata will identify all tables found within the document, including those that span multiple pages. To understand how table selection works, it helps to first look at the source PDF. In the example below, the document contains three distinct tables across multiple pages.

The source PDF showing the three tables Nexadata will identify: Invoice Details (1), Detailed Billing spanning multiple pages (2), and the Rate Card (3)

Table 1 is the Invoice Details block, which contains key reference fields like Invoice #, Project #, PO #, and Invoice Date. Table 2 is the Detailed Billing table, the core dataset, and it spans two pages of the PDF. Table 3 is the Rate Card, a summary table on the final page that lists total hours and billable amounts by consultant.

Nexadata's Table Selection step maps directly to these three tables, giving you control over how each one is named, structured, and used. In the view below, you can see how each table in the PDF has been identified and is ready to be configured.

Table Selection view showing a pivoted header table, the main billing table spanning multiple pages, and a supplemental rate card table

Table 1 - Invoice Details is an example of a table that did not have traditional column headers in the source PDF. Nexadata has automatically pivoted the data so that the row values now appear as column headers. Once pivoted, you can click any header to include it as a standalone column in the Main Table, allowing you to carry forward key reference data like Invoice #, Project #, PO #, and Invoice Date.

Table 2 - Billing Details is designated as the Main Table. This is the primary dataset that spans multiple pages in the source PDF, and Nexadata has automatically combined those pages into a single continuous table. All data rows, including consultant activity, hours, and descriptions, are unified here and will serve as the foundation for your Dataset Builder template.

Table 3 - Rate Card is a supplemental table found on a separate page of the document. It is not set as the Main Table, but will be joined to it when the dataset is built. This allows Nexadata to use the rate card data, including consultant names, total hours, and billable amounts, to enrich and extend the main table. For example, this join can be used to calculate the billable amount for each activity row based on the corresponding consultant's rate.

Any table you do not need can be removed by checking Discard before saving. Use the Column Types tab on each table to set the appropriate data type for each column, such as text, number, currency, or date.

👍 The Main Table will automatically join with any pivoted tables when the dataset is built.

Step 4: Build Your Dataset

When all tables are configured, click Save Tables. You will then be directed to the Dataset Builder, where you provide a name for your dataset and select the Main Table. Before clicking Start Transforming, notice that the Dataset Builder preview already shows the combined dataset on the right. The columns from the Billing Details Main Table appear alongside the Invoice #, Invoice Date, and PO # columns that were selected from the pivoted Invoice Details table. This confirms that the pivoted columns have been automatically joined to the Main Table and will be carried through into every row of the final dataset.

The Dataset Builder setup view showing the Main Table (Billing Details) with the pivoted Invoice Details columns (Invoice #, Invoice Date, PO #) automatically joined and visible in the dataset preview

Once you have named your dataset and confirmed the Main Table selection, click Start Transforming. From here, you can apply transformations using natural language in the Transform Copilot or use Advanced Mode for more complex logic. In the example below, three transformations have been applied: a join operation to bring in the Rate Card data, a calculation to derive the Amount Billed for each activity row, and a final step to remove any unnecessary columns. The result is a clean, enriched dataset ready to be converted and used in a Pipeline.

A fully built Dataset Builder showing three applied transformations: adding the Rate Card via a join, calculating Amount Billed per row, and removing unnecessary columns

Step 5: Convert to Dataset

When your transformations are complete and the dataset looks correct, click Convert to Dataset in the top right corner. This finalizes the template and makes it available for use in a Pipeline. Once converted, the Dataset Builder template can be reused to process future PDF files of the same format, automatically extracting, joining, and transforming the data each time without any manual reconfiguration.

The Convert to Dataset button in the top right of the Dataset Builder interface

Note: The Dataset Builder name must be unique across your organization before you can convert it to a dataset.

Next Steps

For a detailed walkthrough of configuring the full Unstructured Data workflow, refer to the Unstructured Data Quick Start Guide, which covers step-by-step setup from file upload through dataset creation and processing.