Skip to main content

AI Document Parser using Parse

What this guide covers

AI Document Parser. Parse is the dashboard setup that turns a JSON Document Schema into AI-powered structured PDF extraction. You create an Analyzer, pick the Parse type, and describe the fields you want as fieldName, fieldType, and fieldDescription. The AI reads the document semantically, so no drawn capture areas or regex are required. Once saved, the same Analyzer Id runs from the REST API, Make, Zapier, Power Automate, and n8n.

Related Blog Posts(1)

Authenticating Your Setup

AI Document Parser analyzers are created in the PDF4me developer dashboard. Sign in with your account, then create or copy an API key for the AI Parser API calls that reference the analyzer you build here.

Dashboard URLs you will use in this guide:

Important Facts You Should Not Miss

Parse extracts data; Classify routes documents
Pick Parse when you need structured JSON output with one value per defined field. Pick Classify when you only need a category label. The two analyzer types share the same Add modal but produce different output shapes.
fieldDescription drives accuracy
The AI relies on the fieldDescription string to find each value in the document. Be specific and include alternate names ("Sales Order Number, sometimes shown as SO No."). Vague descriptions reduce extraction quality.
Analyzer Id is the stable production reference
The string you type in the Add row (for example purchase_order_parser) is what every API call and automation module uses. Pick a clear naming convention up front; changing it later means rewiring every workflow that points to it.

Step 1: Open AI Document Parser in the dashboard

  1. Sign in at dev.pdf4me.com.
  2. From the dashboard sidebar, click AI Document Parser.
  3. The list page shows every existing Analyzer with three columns: Analyzer Id, Analyzer Type (Parse or Classify), and Actions.
  4. Click the blue + Add button to start a new Analyzer.
PDF4me AI Document Parser dashboard page with sidebar item AI Document Parser highlighted in red, main panel titled AI Document Parser, a blue Add button, and an empty list with column headers Analyzer Id, Analyzer Type, and Actions

AI Document Parser list view. Click + Add to create a new Analyzer.

Step 2: Pick analyzer type Parse and name it

A new row appears with three controls:

  1. Analyzer Id input. type any clear identifier you will remember, for example purchase order parser, Invoice Parser v1, or vendor_statement_parser. There is no naming format restriction. snake_case, camelCase, kebab-case, plain words with spaces, all work the same.
  2. Analyzer Type dropdown. pick Parse (this guide) or Classify (see the Classify guide).
  3. Save / Cancel buttons. Save creates the Analyzer; Cancel discards the row.
PDF4me AI Document Parser Add row open with Analyzer Id input filled with purchase_order_parser, Analyzer Type dropdown showing Classify and Parse options with Parse selected, and Save and Cancel buttons on the right

Add row: type an Analyzer Id, pick Parse, then Save.

Naming tip: include the document family in the name (invoice parser, purchase_order_parser, Shipping Note Parser). The Analyzer Id is what downstream automations call; a clear name pays off every time you wire it into a new platform. Any casing or separator style works.

Step 3: Open the Analyzer and add a Schema

Click the new row to open the detail page. The URL pattern is:

https://developer.pdf4me.com/ai-document-parser/?id=<your-analyzer-guid>

The dashboard issues a GUID per Analyzer the first time you open it. Bookmark this URL to jump straight back to the same Analyzer next time.

The detail page shows:

  • Parse Info (left). shows the Analyzer Id you typed in Step 2. This panel is read-only.
  • Schemas (right). empty by default. Click the + button in the top-right to add a Schema.
  • Save Changes (top-left). persists any edits you make on this page.
  • Back. returns to the Analyzer list.
PDF4me AI Document Parser detail page with blue hero banner reading AI Document Parser, subtitle Automatically extract invoice data using AI with fast accurate and structured output, Back button, Save Changes button, Parse Info panel showing Analyzer Id purchase_order_parser, and an empty Schemas section with a blue plus button to add a schema

Analyzer detail. Click the blue + button to add a Document Schema.

Step 4: Define the Document Schema JSON

After clicking +, an empty Schema card opens. Paste a JSON object with two top-level keys:

  • description. one-sentence summary of what the schema extracts. The AI uses this as overall context.
  • fields. array of field definitions. Each field has:
    • fieldName. machine-readable name (no spaces), used as the JSON key in the response.
    • fieldType. string, number, or date. Drives parsing and validation.
    • fieldDescription. natural-language hint the AI uses to find this field. Be specific.
PDF4me AI Document Parser Schema 1 card open with a Document Schema editor showing JSON with description Extracting data from purchase orders and sales orders, and a fields array with SalesOrderNumber string Sales Order Number, AgentName string Name of the agent mostly given as Agent Name, and DeptCode (partially visible). Below the editor are Invoice and Purchase Order quick-fill buttons.

Document Schema editor with quick-fill buttons for Invoice and Purchase Order templates.

Example: Purchase Order schema

{
"description": "Extracting data from purchase orders and sales orders.",
"fields": [
{
"fieldName": "SalesOrderNumber",
"fieldType": "string",
"fieldDescription": "Sales Order Number"
},
{
"fieldName": "AgentName",
"fieldType": "string",
"fieldDescription": "Name of the agent, mostly given as Agent Name."
},
{
"fieldName": "DeptCode",
"fieldType": "string",
"fieldDescription": "Department or cost-centre code printed on the order."
},
{
"fieldName": "OrderDate",
"fieldType": "date",
"fieldDescription": "Date the order was placed. Accept formats like DD/MM/YYYY, MM-DD-YYYY, or written out as 5 June 2026."
},
{
"fieldName": "TotalAmount",
"fieldType": "number",
"fieldDescription": "Grand total of the order in the document's currency, after taxes."
}
]
}

Example: Invoice schema (use the Invoice quick-fill button)

{
"description": "Extracting data from supplier invoices.",
"fields": [
{
"fieldName": "InvoiceNumber",
"fieldType": "string",
"fieldDescription": "Unique invoice identifier, sometimes shown as Invoice No. or INV."
},
{
"fieldName": "InvoiceDate",
"fieldType": "date",
"fieldDescription": "Date the invoice was issued."
},
{
"fieldName": "DueDate",
"fieldType": "date",
"fieldDescription": "Date payment is due, sometimes shown as Payment Due or Net Due."
},
{
"fieldName": "VendorName",
"fieldType": "string",
"fieldDescription": "Company name of the supplier or vendor sending the invoice."
},
{
"fieldName": "TotalAmount",
"fieldType": "number",
"fieldDescription": "Grand total in the invoice currency, including taxes."
}
]
}

Quick-fill buttons (Invoice, Purchase Order): at the bottom of the Schema card. Use them as a starting point only, click one to load a typical schema for that document family, then rename, trim, or extend the fields to match your real documents. The presets are scaffolding, not final schemas.

Schema with a table field (nested rows)

Use fieldType: "table" when you need to extract repeated rows such as invoice line items or purchase-order line items. Each table field carries its own nested fields array describing the columns.

{
"description": "Invoice data extractor",
"fields": [
{
"fieldName": "invoiceNumber",
"fieldType": "string",
"fieldDescription": "Invoice number / bill number / receipt number"
},
{
"fieldName": "invoiceDate",
"fieldType": "date",
"fieldDescription": "Look for labels: 'Invoice Date', 'Bill Date', 'Date', 'Dated', 'Issue Date', 'Doc Date'. If 4 digit year not found then consider 2 digit year at the end of extracted date.",
"fieldMethod": "generate"
},
{
"fieldName": "lineItems",
"fieldType": "table",
"fieldDescription": "All product / service rows from the invoice table. Be careful, sometimes a row can be part of the next item like when description goes over one line, but it's of a single item.",
"fields": [
{
"fieldName": "itemNumber",
"fieldType": "string",
"fieldDescription": "Product number, product id number or product code"
},
{
"fieldName": "hsnCode",
"fieldType": "string",
"fieldDescription": "HSN / SAC code (4 to 8 digit)"
}
]
}
]
}

Field attributes

AttributeRequired?What it does
fieldNameRequiredThe name of the field and how it will appear in the response JSON.
fieldTypeRequiredThe type of data to extract. One of string, number, date, or table.
fieldDescriptionRequiredNatural-language description of what needs to be extracted and where to find it. Include alternate labels and example formats so the AI matches correctly.
fieldMethodOptional (default extract)How the AI fills the value. extract takes the value verbatim from the document. generate tells the AI to derive or normalise it (useful for dates, computed totals, or cleaned-up IDs). Omit for default extract behaviour.
fieldsRequired when fieldType is tableNested array describing the columns of the table. Each entry takes the same attributes as a top-level field (fieldName, fieldType, fieldDescription, fieldMethod). Cannot itself be table.

fieldType reference

fieldTypeBest forSample fieldDescription
stringNames, identifiers, codes, free textCustomer name as printed on the invoice header.
numberAmounts, quantities, tax rates, countsGrand total of the order in the document currency, including taxes.
dateDates, due dates, issue dates, timestampsDate the invoice was issued, accept DD/MM/YYYY and 5 June 2026 formats.
tableRepeated rows (line items, addresses, transactions)All product / service rows from the invoice table. Carries a nested fields array describing the columns.

Step 5: Save Changes and use the Analyzer

Click Save Changes at the top-left to persist the schema. The Analyzer is now active and can be referenced from any platform by its Analyzer Id.

Use the Analyzer in API or automation calls

Once saved, the same Analyzer runs anywhere by reference. You do not need to recreate the schema on each platform.

FieldSourcePurpose
AnalyzerIdThe string you typed in Step 2Stable identifier the AI Parser uses to locate your schema.
docNameSource PDF filenameUsed for tracking and error messages.
docContentSource PDF encoded as Base64The document to extract from.
asyncfalse for synchronous, true for pollingControls response delivery.

Example REST request body:

{
"docName": "purchase_order.pdf",
"docContent": "BASE64_ENCODED_PDF_CONTENT",
"AnalyzerId": "purchase_order_parser",
"async": false
}

The response contains one field per item you defined in fields. Route that JSON into any downstream system: Google Sheets, Airtable, a database, Excel, or a webhook.

Common workflows

Typical AI Parser patternsHow a saved Analyzer moves from dashboard to production.
Purchase order inbox to ERP
  1. A new purchase order PDF arrives in a monitored mailbox or upload folder.
  2. Make, Zapier, Power Automate, or n8n calls the AI Parser with AnalyzerId: purchase_order_parser.
  3. The returned JSON (SalesOrderNumber, AgentName, OrderDate, TotalAmount) is mapped into your ERP order-creation API.
  4. A confirmation email goes back to the customer using the parsed order number.
Invoice inbox to accounting spreadsheet
  1. An invoice PDF arrives via webhook, watched folder, or shared inbox.
  2. The AI Parser is called with AnalyzerId: invoice_parser (Invoice quick-fill schema).
  3. The structured response (InvoiceNumber, TotalAmount, DueDate) is appended as a row in Google Sheets or Excel for the accounting team.
Classify-then-Parse router
  1. A single watched folder receives mixed documents (invoices, purchase orders, shipping notes).
  2. A Classify Analyzer routes each file to the right category label.
  3. Based on the label, the workflow calls the matching Parse Analyzer (invoice_parser, purchase_order_parser, shipping_note_parser) and writes the structured output to the right destination.

Schema best practices

  • Use descriptive fieldDescription strings. Mention alternate names you see in real documents ("Sales Order Number, also shown as SO No., Order Ref, or PO Ref").
  • Pick fieldType carefully. date and number give the engine parsing hints; string is the fallback when shape is unpredictable.
  • Keep fieldName machine-readable (camelCase or PascalCase, no spaces). It appears as a JSON key in the response.
  • Start from the Invoice or Purchase Order quick-fill, then trim or add fields. The presets are good baselines.
  • Test against three real samples before pointing production traffic at the Analyzer, including edge cases like missing optional fields, second-page documents, and OCR-derived text.
  • Version Analyzer Ids when making breaking schema changes (invoice_parser_v1, invoice_parser_v2) so live automations can migrate at their own pace.

Frequently Asked Questions

What is the difference between Parse and Classify in AI Document Parser?+
Parse extracts structured field values from documents using a Document Schema. The response is a JSON object with one value per field you defined. Classify routes a document into one of several categories you defined; the response is a label string. Use Parse when you need data, Classify when you need a routing decision.
How is AI Document Parser different from the older Parse Document templates?+
The older Parse Document setup uses Regex Expression or JavaScript Expression applied to drawn capture areas on a sample PDF. AI Document Parser uses a JSON Document Schema with natural-language field descriptions; the AI engine reads the document semantically rather than by fixed positions. No sample PDF is required when defining an AI Analyzer.
What does fieldDescription do?+
It is the natural-language hint the AI uses to find each field in the document. Be specific and include alternate names ("Sales Order Number, sometimes shown as SO No."). Descriptions are the single biggest factor in extraction accuracy.
Can I add more than one Schema to the same Analyzer?+
Yes. The detail page supports multiple Schemas via the + button. Use this when one Analyzer should handle two related document families (purchase orders and sales orders) with slightly different field sets.
Does the Analyzer Id change after Save Changes?+
No. The Analyzer Id is the string you typed in the Add row in Step 2. It stays stable across edits and is the value every API call and automation module uses. Pick a clear naming convention up front.
Do I need to upload a sample PDF when setting up the Analyzer?+
No. Unlike the older Parse Document templates, the AI Analyzer does not need a sample for setup. You define a Document Schema in JSON; the AI engine applies it to whatever document you send at runtime.
How do I call the Analyzer from the REST API?+
Send docName, docContent (the PDF as Base64), AnalyzerId (the string you typed in Step 2), and async (false for immediate response, true for polling). The response contains one JSON property per field you defined in the Document Schema.
Can the AI Parser handle scanned PDFs?+
Yes, when the PDF has been OCR-processed first. Run the source file through the PDF4me OCR endpoint before sending it to the AI Parser. The schema then extracts from the OCR text layer the same way it would from a native PDF.

Get Help