Extract Text By Expression using n8n action
PDF4me Extract Text By Expression extracts specific text content from PDF documents using pattern matching and expression-based filtering through n8n automation workflows. Process PDFs via n8n triggers, binary data, base64 strings, or public URLs to locate and extract text using wildcard patterns (%), regular expressions, specific data fields, page-specific targeting, and flexible extraction rules with structured output. This solution is ideal for invoice number extraction, data field capture, pattern-based text retrieval, automated data extraction, document parsing, and content analysis workflows that require precise text targeting with flexible matching and seamless integration.
Setup
Add the PDF4me "Extract Text By Expression" node to your n8n workflow and configure the required parameters. For initial setup instructions, see our n8n Integration Guide.
Prerequisites:
- PDF4me API credentials
- n8n workflow access
Configuration:
- Add PDF4me node to workflow
- Select "Extract Text By Expression" action
- Configure input parameters (see below)

Parameters
Complete list of parameters for the Extract Text By Expression action. Configure these parameters to control text extraction.
Important: Parameters marked with an asterisk (***) are required and must be provided for the action to function correctly.
| Parameter | Type | Description | Example |
|---|---|---|---|
| Input Data Type*** | String | PDF Input Format Selection • Choose the format of your PDF data input • PDF4me supports multiple input types • Options: Binary Data, Base64 String, or URL | Binary Data |
| Input Binary Field | Binary | Binary PDF File Input (Required if Binary Data) • Reference PDF file from previous n8n node or file upload • PDF4me processes binary PDF files with automatic format detection • Required when Input Data Type is "Binary Data" | {{ $binary.data }} |
| Base64 PDF Content | String | Base64 Encoded PDF Input (Required if Base64 String) • Provide PDF content as base64 encoded string • PDF4me automatically decodes and processes the PDF content • Required when Input Data Type is "Base64 String" | JVBER... |
| PDF URL | String | Public PDF URL Input (Required if URL) • Provide a public/open permission URL to the PDF file • PDF4me downloads and processes the file from URL • Required when Input Data Type is "URL" | https://abc.com/document.pdf |
| Document Name*** | String | Input Filename • Specify the name of the input PDF file • Used for format detection and processing optimization • Must include .pdf extension | document.pdf |
| Expression*** | String | Text Extraction Pattern • Define the pattern or expression to match text content • Supports regex patterns, wildcards, and custom expressions • Use % for wildcard matching or specific regex patterns | % |
| Page Sequence*** | String | Page Range Specification • Define which pages to process for text extraction • Use "all" for entire document or specific page numbers/ranges • Examples: "1,3,5" (specific), "1-5" (range), "1-" (from page 1 to end) | 1- |
Advanced Options
The following parameters are available in the Advanced Options section and are optional:
| Parameter | Type | Description | Example |
|---|---|---|---|
| Custom Profiles | String | Custom Configuration Profiles • Set additional options using custom profiles • JSON-like format containing predefined parameters • Enables advanced extraction processing settings • Optional for specialized requirements | { "outputDataFormat": "json" } |
Output
Output Parameters
| Parameter | Type | Description | Example |
|---|---|---|---|
| fileName | String | PDF4me generated filename - The complete filename of the successfully processed document with proper extension and timestamp. PDF4me ensures unique naming and validates file format compliance for seamless integration with downstream processes | text_extraction_results_1756999697398.json |
| mimeType | String | PDF4me MIME type identifier - The standardized MIME type for the extracted content file, typically application/json for structured data or application/zip for multiple files. This ensures proper file handling and recognition across all systems and applications | application/json |
| fileSize | Number | PDF4me file size in bytes - The exact size of the extracted content file in bytes, provided for storage planning, bandwidth optimization, and file transfer monitoring. Essential for enterprise document management and workflow automation | 106 |
| success | Boolean | PDF4me extraction status indicator - Boolean flag indicating the success or failure of the text extraction process. Returns true for successful extractions and false for any errors, enabling robust error handling in automated workflows | true |
| message | String | PDF4me extraction status message - Descriptive message indicating the result of the text extraction process. Provides clear status messages for successful extractions and detailed error information for troubleshooting purposes | Text extraction by expression completed successfully |
| docName | String | PDF4me original document name reference - The original filename of the input PDF file that was processed. This reference is maintained for audit trails, debugging purposes, and tracking the source of extracted content in enterprise workflows | document.pdf |
| expression | String | PDF4me used extraction expression - The pattern or expression that was used for text extraction. This field shows the actual expression that was applied during the extraction process for verification and debugging purposes | % |
| pageSequence | String | PDF4me processed page range - The page range that was processed during text extraction. This field shows the actual pages that were scanned for the specified expression, useful for verification and troubleshooting | 1-2 |
N8N Action Response
The PDF4me Extract Text By Expression API returns a response that can be viewed in multiple formats. Choose the view that best fits your needs:
- JSON
- Table
- Schema
- Binary
JSON Response Format
The raw JSON response from the API:
{
"fileName": "text_extraction_results_1756999697398.json",
"mimeType": "application/json",
"fileSize": 106,
"success": true,
"message": "Text extraction by expression completed successfully",
"docName": "document.pdf",
"expression": "%",
"pageSequence": "1-2"
}
Table View
Response data in a structured table format:
| Parameter | Value |
|---|---|
| fileName | text_extraction_results_1756999697398.json |
| mimeType | application/json |
| fileSize | 106 |
| success | true |
| docName | document.pdf |
| message | Text extraction by expression completed successfully |
| expression | % |
| pageSequence | 1-2 |
Schema View
The data structure and types of the response:
fileName: AB text_extraction_results_1756999697398.json
mimeType: AB application/json
fileSize: # 106
success: ✓ true
docName: AB document.pdf
message: AB Text extraction by expression completed successfully
expression: AB %
pageSequence: AB 1-2
Type Indicators:
AB= String#= Number✓= Boolean
Binary Data View
The actual extracted text content data and metadata:
data
─────────────────────────────────────────
File Name: text_extraction_results_1756999697398.json
File Extension: json
Mime Type: application/json
File Size: 106 bytes
Use Cases
Data Mining and Information Extraction
- Extract specific data patterns, contact information, and structured data from PDF documents using custom expressions for data mining and business intelligence
- Process legal documents, contracts, and official records by extracting specific clauses, terms, and legal information using targeted expressions
- Transform financial reports, invoices, and accounting documents by extracting specific financial data, amounts, and transaction information using custom expressions
Content Analysis and Research
- Extract research data, citations, and academic information from PDF documents using custom expressions for academic research and content analysis
- Process scientific papers, research documents, and technical publications by extracting specific data, measurements, and research findings using targeted expressions
- Transform business reports, market research, and industry analysis by extracting specific metrics, statistics, and business data using custom expressions
Compliance and Regulatory Processing
- Extract compliance data, regulatory information, and audit details from PDF documents using custom expressions for regulatory compliance and audit processing
- Process quality control documents, inspection reports, and certification materials by extracting specific compliance data, standards, and regulatory information
- Transform official documents, permits, and regulatory filings by extracting specific regulatory data, compliance metrics, and official information using custom expressions