Skip to main content

Prepare Parse Info for Document

PDF4me Parse Document enables you to extract structured data from PDF documents using template-based parsing. This guide walks you through creating and configuring parse templates in the PDF4me dashboard to automate data extraction from documents.

Create Template

  1. Go to the parse document page in your PDF4me dashboard
  2. Click Add and input the template name in its field
  3. Click Save to save the template
  4. Click the Edit button to begin configuration
Parse document template list in PDF4me dashboard

Configure Template

  1. Click Upload Template File and upload a representative PDF file that matches the structure of documents you want to parse
  2. Select the required template file from your uploads
  3. Draw capture areas on the template page by selecting rectangular areas around the data you want to extract
  4. Rename the keys on the left side to reflect the data they will contain (e.g., invoiceNumber, customerName, totalAmount)
  5. Optionally provide a regular expression for individual keys to validate or format the extracted data
  6. Click Test Parse to preview the parsed results with your current configuration
  7. Click Save Changes to save the template configuration
Parse document template configuration interface showing capture areas and field configuration

Template Configuration Tips

  • Capture Areas: Draw boxes that fully enclose the data fields. Position them accurately to match the document layout
  • Field Names: Use descriptive names (camelCase or snake_case) that clearly indicate what data each field contains
  • Regular Expressions: Use regex patterns to validate extracted data. Common patterns:
    • Dates: \d{2}/\d{2}/\d{4} for MM/DD/YYYY format
    • Currency: \$?\d{1,3}(?:,\d{3})*(?:\.\d{2})? for currency amounts
    • Invoice Numbers: INV-\d{6,10} for invoice number patterns

Using Templates in API Calls

After saving your template, you'll need the following information for API integration:

  • TemplateId: A GUID identifier found in your template details after saving
  • TemplateName: The name you assigned to the template (alternative to TemplateId)
  • ParseId: A unique GUID for each parsing operation (can be generated client-side)

Example API request:

{
"docName": "invoice.pdf",
"docContent": "BASE64_ENCODED_PDF_CONTENT",
"TemplateId": "12345678-1234-1234-1234-123456789abc",
"ParseId": "87654321-4321-4321-4321-cba987654321",
"async": false
}