Skip to main content

Extract Text By Expression using n8n action

PDF4me Extract Text By Expression extracts specific text content from PDF documents using pattern matching and expression-based filtering through n8n automation workflows. Process PDFs via n8n triggers, binary data, base64 strings, or public URLs to locate and extract text using wildcard patterns (%), regular expressions, specific data fields, page-specific targeting, and flexible extraction rules with structured output. This solution is ideal for invoice number extraction, data field capture, pattern-based text retrieval, automated data extraction, document parsing, and content analysis workflows that require precise text targeting with flexible matching and seamless integration.

Setup

Add the PDF4me "Extract Text By Expression" node to your n8n workflow and configure the required parameters. For initial setup instructions, see our n8n Integration Guide.

Prerequisites:

  • PDF4me API credentials
  • n8n workflow access

Configuration:

  1. Add PDF4me node to workflow
  2. Select "Extract Text By Expression" action
  3. Configure input parameters (see below)
Extract Text By Expression Configuration

Parameters

Complete list of parameters for the Extract Text By Expression action. Configure these parameters to control text extraction.

Important: Parameters marked with an asterisk (***) are required and must be provided for the action to function correctly.

ParameterTypeDescriptionExample
Input Data Type***StringPDF Input Format Selection
• Choose the format of your PDF data input
• PDF4me supports multiple input types
• Options: Binary Data, Base64 String, or URL
Binary Data
Input Binary FieldBinaryBinary PDF File Input (Required if Binary Data)
• Reference PDF file from previous n8n node or file upload
• PDF4me processes binary PDF files with automatic format detection
• Required when Input Data Type is "Binary Data"
{{ $binary.data }}
Base64 PDF ContentStringBase64 Encoded PDF Input (Required if Base64 String)
• Provide PDF content as base64 encoded string
• PDF4me automatically decodes and processes the PDF content
• Required when Input Data Type is "Base64 String"
JVBER...
PDF URLStringPublic PDF URL Input (Required if URL)
• Provide a public/open permission URL to the PDF file
• PDF4me downloads and processes the file from URL
• Required when Input Data Type is "URL"
https://abc.com/document.pdf
Document Name***StringInput Filename
• Specify the name of the input PDF file
• Used for format detection and processing optimization
• Must include .pdf extension
document.pdf
Expression***StringText Extraction Pattern
• Define the pattern or expression to match text content
• Supports regex patterns, wildcards, and custom expressions
• Use % for wildcard matching or specific regex patterns
%
Page Sequence***StringPage Range Specification
• Define which pages to process for text extraction
• Use "all" for entire document or specific page numbers/ranges
• Examples: "1,3,5" (specific), "1-5" (range), "1-" (from page 1 to end)
1-

Advanced Options

The following parameters are available in the Advanced Options section and are optional:

ParameterTypeDescriptionExample
Custom ProfilesStringCustom Configuration Profiles
• Set additional options using custom profiles
• JSON-like format containing predefined parameters
• Enables advanced extraction processing settings
• Optional for specialized requirements
{ "outputDataFormat": "json" }

Output

Output Parameters

ParameterTypeDescriptionExample
fileNameStringPDF4me generated filename - The complete filename of the successfully processed document with proper extension and timestamp. PDF4me ensures unique naming and validates file format compliance for seamless integration with downstream processestext_extraction_results_1756999697398.json
mimeTypeStringPDF4me MIME type identifier - The standardized MIME type for the extracted content file, typically application/json for structured data or application/zip for multiple files. This ensures proper file handling and recognition across all systems and applicationsapplication/json
fileSizeNumberPDF4me file size in bytes - The exact size of the extracted content file in bytes, provided for storage planning, bandwidth optimization, and file transfer monitoring. Essential for enterprise document management and workflow automation106
successBooleanPDF4me extraction status indicator - Boolean flag indicating the success or failure of the text extraction process. Returns true for successful extractions and false for any errors, enabling robust error handling in automated workflowstrue
messageStringPDF4me extraction status message - Descriptive message indicating the result of the text extraction process. Provides clear status messages for successful extractions and detailed error information for troubleshooting purposesText extraction by expression completed successfully
docNameStringPDF4me original document name reference - The original filename of the input PDF file that was processed. This reference is maintained for audit trails, debugging purposes, and tracking the source of extracted content in enterprise workflowsdocument.pdf
expressionStringPDF4me used extraction expression - The pattern or expression that was used for text extraction. This field shows the actual expression that was applied during the extraction process for verification and debugging purposes%
pageSequenceStringPDF4me processed page range - The page range that was processed during text extraction. This field shows the actual pages that were scanned for the specified expression, useful for verification and troubleshooting1-2

N8N Action Response

The PDF4me Extract Text By Expression API returns a response that can be viewed in multiple formats. Choose the view that best fits your needs:

JSON Response Format

The raw JSON response from the API:

{
"fileName": "text_extraction_results_1756999697398.json",
"mimeType": "application/json",
"fileSize": 106,
"success": true,
"message": "Text extraction by expression completed successfully",
"docName": "document.pdf",
"expression": "%",
"pageSequence": "1-2"
}

Use Cases

Data Mining and Information Extraction

  • Extract specific data patterns, contact information, and structured data from PDF documents using custom expressions for data mining and business intelligence
  • Process legal documents, contracts, and official records by extracting specific clauses, terms, and legal information using targeted expressions
  • Transform financial reports, invoices, and accounting documents by extracting specific financial data, amounts, and transaction information using custom expressions

Content Analysis and Research

  • Extract research data, citations, and academic information from PDF documents using custom expressions for academic research and content analysis
  • Process scientific papers, research documents, and technical publications by extracting specific data, measurements, and research findings using targeted expressions
  • Transform business reports, market research, and industry analysis by extracting specific metrics, statistics, and business data using custom expressions

Compliance and Regulatory Processing

  • Extract compliance data, regulatory information, and audit details from PDF documents using custom expressions for regulatory compliance and audit processing
  • Process quality control documents, inspection reports, and certification materials by extracting specific compliance data, standards, and regulatory information
  • Transform official documents, permits, and regulatory filings by extracting specific regulatory data, compliance metrics, and official information using custom expressions

Get Help