Skip to main content

AI Document Parser using Classify

What this guide covers

AI Document Parser. Classify lets one Analyzer handle several document variants in a single call. You define an Analyzer Id, then add one Schema per variant (Client ABC, Client XYZ, generic invoice, generic purchase order). Each Schema carries a Classification Name, a Classification Prompt in natural language, and a Document Schema describing the fields to extract. At runtime the AI matches the incoming PDF against every Schema's Classification Prompt, picks the best fit, and returns both the matched Classification Name and the extracted fields. The same Analyzer Id works from the REST API and every supported automation platform.

Related Blog Posts
No blog post yet for this feature — coming soon.
In the meantime, browse the PDF4me blog for tutorials and workflows across every platform.
Visit the blog →

Authenticating Your Setup

Classify Analyzers are created in the PDF4me developer dashboard. Sign in with your account, then create or copy an API key for the AI Parser API calls that reference the Analyzer you build here.

Dashboard URLs you will use in this guide:

Important Facts You Should Not Miss

One Analyzer, many Schemas (one per variant)
Classify is built for the real world where Client ABC's invoices look different from Client XYZ's. Add as many Schemas as you have variants. Each Schema is a self-contained Classification Name + Prompt + Document Schema bundle.
The Classification Prompt is the routing brain
Write it in plain English. Mention concrete markers like the company name at the top, the contact email, layout cues, words that only this variant uses. The clearer the prompt, the cleaner the routing.
Classify returns BOTH the label and the data
Unlike a pure classifier, the response includes the matched Classification Name and all fields from that Schema's Document Schema. One call gives you the routing decision and the extracted JSON together.

Step 1: Open AI Document Parser in the dashboard

  1. Sign in at dev.pdf4me.com.
  2. From the dashboard sidebar, click AI Document Parser.
  3. The list page shows every existing Analyzer with three columns: Analyzer Id, Analyzer Type (Parse or Classify), and Actions.
  4. Click the blue + Add button to start a new Analyzer.
PDF4me AI Document Parser dashboard list view with the sidebar item AI Document Parser highlighted in red, a blue Add button at the top of the main panel, and an empty table with column headers Analyzer Id, Analyzer Type, and Actions

AI Document Parser list view. Click + Add to start a new Analyzer.

Step 2: Create a Classify Analyzer

A new row appears with three controls:

  1. Analyzer Id input. type any clear identifier you will remember, for example client_invoices, Vendor Documents, or incoming-orders. There is no naming format restriction. snake_case, camelCase, kebab-case, plain words with spaces, all work the same. The example below uses client_invoices.
  2. Analyzer Type dropdown. pick Classify (this guide) or Parse (see the Parse guide).
  3. Save / Cancel buttons. Save creates the Analyzer; Cancel discards the row.
PDF4me AI Document Parser Add row open with Analyzer Id field filled with client_invoices, Analyzer Type dropdown showing Classify highlighted above Parse, and a Classify value selected in the row, with Save and Cancel buttons to the right

Add row: enter the Analyzer Id (here client_invoices), pick Classify, then Save.

Step 3: Open the Analyzer and add the first Schema

Click the new row to open the detail page. The URL pattern is:

https://developer.pdf4me.com/ai-document-parser/?id=<your-analyzer-guid>

The dashboard issues a GUID per Analyzer the first time you open it. Bookmark this URL to jump straight back to the same Classify Analyzer next time.

The detail page shows:

  • Classify Info (left). shows the Analyzer Id you typed in Step 2 (here client_invoices). This panel is read-only.
  • Schemas (right). empty by default. Each Schema is one document variant. Click the + button in the top-right to add the first Schema.
  • Save Changes (top-left). persists every edit you make on this page.
  • Back. returns to the Analyzer list.
PDF4me AI Document Parser detail page for a Classify Analyzer. Blue hero banner reads AI Document Parser with the subtitle Automatically extract invoice data using AI with fast accurate and structured output. Below are a Back button and a blue Save Changes button. The Classify Info panel on the left shows Analyzer Id client_invoices. The Schemas section on the right is empty, with a blue plus button on the far right to add a new Schema.

Detail page. Classify Info on the left holds the Analyzer Id. Click the blue + button to add a Schema.

Step 4: Fill Classification Name, Prompt, and Document Schema

Each Schema is one document variant the Analyzer must handle. Every Schema has three required inputs and a familiar quick-fill row at the bottom.

Schema fields (required)

FieldPurposeExample
Classification NameThe label returned by the API when this Schema matches. Use a short, business-friendly name.Client ABC
Classification PromptNatural-language description of how to recognise this variant. Mention company name, email, layout cues, distinguishing phrases. This is what the AI uses to route.The invoice has the company name ABC at the top, followed by the contact email [email protected].
Document SchemaJSON object with a description and a fields array. Each field has fieldName, fieldType, and fieldDescription, plus an optional fieldMethod such as generate for derived values.{ "description": "Extract invoice detail for ABC invoices", "fields": [ ... ] }

Example Schema: Client ABC

PDF4me AI Document Parser Classify Schema editor showing the first Schema titled Client ABC. The Classify Info panel on the left holds Analyzer Id client_invoices. The Schema card has a red trash icon to delete, Classification Name field with value Client ABC, Classification Prompt field with text The Invoice name at top will contain the name of the company ABC and after that the email will be on top for the client invoices at abc.com, and a Document Schema JSON editor showing a description Extract invoice detail for ABC invoices and a fields array starting with invoiceNumber string Invoice number bill number receipt number and invoiceDate date generate. Below the editor are Invoice and Purchase Order quick-fill buttons.

First Schema. Classification Name, Classification Prompt, and Document Schema together describe one document variant.

Client ABC. Document Schema
{
"description": "Extract invoice detail for ABC invoices",
"fields": [
{
"fieldName": "invoiceNumber",
"fieldType": "string",
"fieldDescription": "Invoice number / bill number / receipt number"
},
{
"fieldName": "invoiceDate",
"fieldType": "date",
"fieldMethod": "generate",
"fieldDescription": "Look for labels: 'Invoice Date', 'Bill Date', 'Date', 'Dated', 'Issue Date', 'Doc Date'. If 4-digit year not found then consider 2-digit year at the end of extracted date."
}
]
}

The Classification Prompt for Client ABC is the natural-language hint that makes the AI route to this Schema:

"The invoice has the company name ABC at the top, followed by the contact email [email protected]."

Step 5: Add more Schemas and Save Changes

Click the blue + button again to add a Schema for the next variant. Each new Schema gets its own Classification Name, Classification Prompt, and Document Schema. The fields can differ entirely between Schemas. Client ABC may not have a customer reference column while Client XYZ does.

Example second Schema: Client XYZ

PDF4me AI Document Parser Classify second Schema card titled Client XYZ. Classification Name Client XYZ, Classification Prompt The document is an invoice with name of the company XYZ on top then the email support at xyz.com followed by the address, and a Document Schema JSON editor showing description Extract Invoice data from company XYZ invoice with a fields array starting with invoiceNumber and invoiceDate. Below the editor are Invoice and Purchase Order quick-fill buttons.

Second Schema for the same Analyzer. Both Schemas now live under client_invoices.

Client XYZ. Document Schema
{
"description": "Extract Invoice data from company XYZ's Invoice",
"fields": [
{
"fieldName": "invoiceNumber",
"fieldType": "string",
"fieldDescription": "Invoice number / bill number / receipt number"
},
{
"fieldName": "invoiceDate",
"fieldType": "date",
"fieldMethod": "generate",
"fieldDescription": "Look for labels: 'Invoice Date', 'Bill Date', 'Date', 'Dated', 'Issue Date', 'Doc Date'. If 4-digit year not found then consider 2-digit year at the end of extracted date."
}
]
}

The Schemas list collapses after you Save

Once both Schemas are in, the list shows each one as a collapsible card with the Classification Name on the left and a chevron to expand on the right. Click a card to edit it again later.

PDF4me AI Document Parser detail page for a Classify Analyzer showing both Schemas in the collapsed list view. The Classify Info panel on the left shows Analyzer Id client_invoices. The Schemas section on the right lists Client ABC and Client XYZ as two collapsed accordion cards with chevron expand arrows. A blue plus button sits in the top-right corner to add more Schemas.

Both Schemas saved. Click any card to expand and edit; use + to add more variants.

Click Save Changes at the top-left to persist every Schema you added. The Analyzer is now live and ready to receive PDFs.

Field-level options inside Document Schema

AttributeRequired?What it does
fieldNameRequiredThe name of the field and how it will appear in the response JSON.
fieldTypeRequiredThe type of data to extract. One of string, number, date, or table.
fieldDescriptionRequiredNatural-language description of what needs to be extracted and where to find it. Include alternate labels and example formats so the AI matches correctly.
fieldMethodOptional (default extract)How the AI fills the value. extract takes the value verbatim from the document. generate tells the AI to derive or normalise it (useful for dates, computed totals, or cleaned-up IDs). Omit for default extract behaviour.
fieldsRequired when fieldType is tableNested array describing the columns of the table. Each entry takes the same attributes as a top-level field (fieldName, fieldType, fieldDescription, fieldMethod). Cannot itself be table.

Schema with a table field (nested rows)

Use fieldType: "table" when a Schema needs to extract repeated rows such as invoice line items. Each table field carries its own nested fields array describing the columns.

{
"description": "Invoice data extractor",
"fields": [
{
"fieldName": "invoiceNumber",
"fieldType": "string",
"fieldDescription": "Invoice number / bill number / receipt number"
},
{
"fieldName": "invoiceDate",
"fieldType": "date",
"fieldDescription": "Look for labels: 'Invoice Date', 'Bill Date', 'Date', 'Dated', 'Issue Date', 'Doc Date'. If 4 digit year not found then consider 2 digit year at the end of extracted date.",
"fieldMethod": "generate"
},
{
"fieldName": "lineItems",
"fieldType": "table",
"fieldDescription": "All product / service rows from the invoice table. Be careful, sometimes a row can be part of the next item like when description goes over one line, but it's of a single item.",
"fields": [
{
"fieldName": "itemNumber",
"fieldType": "string",
"fieldDescription": "Product number, product id number or product code"
},
{
"fieldName": "hsnCode",
"fieldType": "string",
"fieldDescription": "HSN / SAC code (4 to 8 digit)"
}
]
}
]
}

Quick-fill buttons (use as a starting point)

The Schema editor offers Invoice and Purchase Order preset buttons at the bottom of the Document Schema card. Use them as a starting point only. Click one to populate a typical schema for that document family, then tweak field names, types, methods, and descriptions to match the specific variant you are configuring. The presets are scaffolding, not the final shape.

Use the Classify Analyzer in API or automation calls

Once saved, the same Analyzer Id runs anywhere by reference. You do not need to recreate Schemas on each platform.

FieldSourcePurpose
AnalyzerIdThe string you typed in Step 2 (client_invoices)Stable identifier for the Classify Analyzer.
docNameSource PDF filenameUsed for tracking and error messages.
docContentSource PDF encoded as Base64The document to classify and extract from.
asyncfalse for synchronous, true for pollingControls response delivery.

Example REST request body:

{
"docName": "incoming_invoice.pdf",
"docContent": "BASE64_ENCODED_PDF_CONTENT",
"AnalyzerId": "client_invoices",
"async": false
}

The response carries the matched Classification Name plus the fields defined in that Schema's Document Schema. A document recognised as Client ABC returns the Client ABC fields; a document recognised as Client XYZ returns the Client XYZ fields. One call, both decisions.

Common workflows

Typical Classify Analyzer patternsHow a Classify Analyzer with multiple Schemas runs in production.
Multi-client AP inbox
  1. Vendor invoices from Client ABC, Client XYZ, and a handful of others arrive in a single mailbox.
  2. Make, Zapier, Power Automate, or n8n calls AI Parser with AnalyzerId: client_invoices.
  3. The response carries the Classification Name (Client ABC, Client XYZ, etc.) and the extracted fields.
  4. A switch step routes each result to the matching ERP customer record using the Classification Name; the extracted fields populate the invoice record directly.
Vendor onboarding without rebuilds
  1. A new vendor sends an invoice with a different layout for the first time.
  2. Operations open the existing client_invoices Analyzer in the dashboard and click +.
  3. A new Schema is added with a Classification Name (the vendor's name), a Classification Prompt describing the layout, and a Document Schema for the fields that matter.
  4. The next inbound PDF is routed to the new Schema automatically. No automation rewiring needed.
Mixed-document triage
  1. One Analyzer covers invoices, purchase orders, and credit notes.
  2. Each Schema's Classification Prompt describes the distinguishing markers ("contains Bill To and a payable total" vs. "contains a PO number and a Ship To address").
  3. The Document Schema per Schema extracts the fields relevant to that document type.
  4. Downstream routing branches on the Classification Name; each branch writes to the matching system.

Schema best practices for Classify

  • One Schema per stable document variant. Two vendors with very different layouts should be two Schemas, not one with overlapping rules.
  • The Classification Prompt is your routing rule. Write it in plain English and include concrete markers (company name, email, distinctive phrases, layout cues).
  • Keep the Classification Name short, business-friendly, and stable. It is what your downstream routing branches on.
  • Use the Invoice and Purchase Order quick-fill buttons as starting points, then tighten field descriptions for each variant.
  • Add fieldMethod: "generate" to dates, computed totals, or fields where the AI should normalize the value instead of copying it verbatim.
  • Test each Schema against a real sample before going live: an obvious Client ABC invoice should always route to the Client ABC Schema, not Client XYZ.
  • Click Save Changes after every editing round; unsaved Schemas are lost on navigation.

Frequently Asked Questions

How is Classify different from Parse in AI Document Parser?+
Parse holds a single Document Schema and returns its extracted fields. Classify holds multiple Schemas, one per document variant, each with its own Classification Prompt and Document Schema. At runtime the AI picks the best-matching Schema and returns its fields plus the Classification Name. Use Classify when one Analyzer must handle several layouts.
What is the Classification Prompt?+
A natural-language description of the markers that identify this document variant. Mention the company name printed at the top, the contact email, layout cues, anything distinctive. The AI compares every Schema Classification Prompt against the incoming PDF to pick the best match.
Can each Schema have different fields?+
Yes. Each Schema has its own Document Schema, so Client ABC can extract invoiceNumber and invoiceDate while Client XYZ extracts invoiceNumber, customerRef, and PONumber. The response contains exactly the fields defined in the matched Schema.
What does fieldMethod: generate do?+
Inside a Document Schema, fieldMethod controls how the AI fills the value. generate tells the AI to derive or normalize the value rather than copy a substring verbatim. Useful for dates (normalize multiple formats), totals (compute or normalize), and identifiers that may need cleanup. Omit fieldMethod for straight extraction.
How many Schemas can I add to one Analyzer?+
No hard cap published. In practice, 5 to 20 well-described Schemas per Analyzer perform best. If you need dozens, group similar variants under broader Classification Names and let the Document Schema handle the small differences.
Does the Analyzer Id change after Save Changes?+
No. The Analyzer Id is the string you typed in the Add row in Step 2 (for example client_invoices). It stays stable across edits and is the value every API call and automation module uses.
Do I need to upload a sample PDF when setting up the Analyzer?+
No. The AI Analyzer does not need a sample for setup. You describe each variant in natural language via the Classification Prompt; the AI engine applies the prompts to whatever document you send at runtime.
What happens if the incoming PDF does not match any Schema confidently?+
The AI still returns its best guess. To handle ambiguous cases, add a catch-all Schema with a generic Classification Prompt and a minimal Document Schema, or route low-confidence responses to a manual review queue in your automation.

Get Help