Skip to main content

Extract Resources from PDF using n8n action

PDF4me Extract Resources extracts text content and embedded images from PDF documents through n8n automation workflows. Process PDFs via n8n triggers, binary data, base64 strings, or public URLs to automatically extract text, images, graphics, and visual elements with page-specific targeting, selective extraction options (text only, images only, or both), and structured output formatting. This solution is ideal for content reuse, image extraction, text analysis, document digitization, content migration, and resource recovery workflows that require accurate content extraction with flexible targeting and seamless integration.

Setup

Add the PDF4me "Extract Resources" node to your n8n workflow and configure the required parameters. For initial setup instructions, see our n8n Integration Guide.

Prerequisites:

  • PDF4me API credentials
  • n8n workflow access

Configuration:

  1. Add PDF4me node to workflow
  2. Select "Extract Resources" action
  3. Configure input parameters (see below)
Extract Resources Configuration

Parameters

Complete list of parameters for the Extract Resources action. Configure these parameters to control resource extraction.

Important: Parameters marked with an asterisk (***) are required and must be provided for the action to function correctly.

ParameterTypeDescriptionExample
Input Data Type***StringPDF Input Format Selection
• Choose the format of your PDF data input
• PDF4me supports multiple input types
• Options: Binary Data, Base64 String, or URL
Binary Data
Input Binary FieldBinaryBinary PDF File Input (Required if Binary Data)
• Reference PDF file from previous n8n node or file upload
• PDF4me processes binary PDF files with automatic format detection
• Required when Input Data Type is "Binary Data"
{{ $binary.data }}
Base64 PDF ContentStringBase64 Encoded PDF Input (Required if Base64 String)
• Provide PDF content as base64 encoded string
• PDF4me automatically decodes and processes the PDF content
• Required when Input Data Type is "Base64 String"
JVBERi0x...
PDF URLStringPublic PDF URL Input (Required if URL)
• Provide a public/open permission URL to the PDF file
• PDF4me downloads and processes the file from URL
• Required when Input Data Type is "URL"
https://abc.com/xyz.pdf
Document Name***StringInput Filename
• Specify the name of the input PDF file
• Used for format detection and processing optimization
• Must include .pdf extension
document.pdf
Extract TextBooleanText Extraction Toggle
• Enable or disable text content extraction
• When enabled, extracts all readable text content
• Preserves formatting information
true
Extract ImagesBooleanImage Extraction Toggle
• Enable or disable image extraction
• When enabled, extracts all embedded images and graphics
• Visual elements saved as separate files
true

Advanced Options

The following parameters are available in the Advanced Options section and are optional:

ParameterTypeDescriptionExample
PagesStringPage Range Specification
• Define which pages to process for resource extraction
• Use "all" for entire document, specific numbers, or ranges
• Examples: "1,3,5" (specific pages), "1-5,10-15" (ranges)
all
Custom ProfilesStringCustom Configuration Profiles
• Set additional options using custom profiles
• JSON-like format containing predefined parameters
• Enables advanced extraction processing settings
• Optional for specialized requirements
{ "outputDataFormat": "json" }

Output

Output Parameters

ParameterTypeDescriptionExample
fileNameStringPDF4me generated filename - The complete filename of the successfully processed document with proper extension and timestamp. PDF4me ensures unique naming and validates file format compliance for seamless integration with downstream processesextracted_resources_1756999816629.json
mimeTypeStringPDF4me MIME type identifier - The standardized MIME type for the extracted content file, typically application/json for structured data or application/zip for multiple files. This ensures proper file handling and recognition across all systems and applicationsapplication/json
fileSizeNumberPDF4me file size in bytes - The exact size of the extracted content file in bytes, provided for storage planning, bandwidth optimization, and file transfer monitoring. Essential for enterprise document management and workflow automation1945
successBooleanPDF4me extraction status indicator - Boolean flag indicating the success or failure of the resource extraction process. Returns true for successful extractions and false for any errors, enabling robust error handling in automated workflowstrue
messageStringPDF4me extraction status message - Descriptive message indicating the result of the resource extraction process. Provides clear status messages for successful extractions and detailed error information for troubleshooting purposesResource extraction completed successfully
docNameStringPDF4me original document name reference - The original filename of the input PDF file that was processed. This reference is maintained for audit trails, debugging purposes, and tracking the source of extracted content in enterprise workflowsdocument.pdf

N8N Action Response

The PDF4me Extract Resources API returns a response that can be viewed in multiple formats. Choose the view that best fits your needs:

JSON Response Format

The raw JSON response from the API:

{
"fileName": "extracted_resources_1756999816629.json",
"mimeType": "application/json",
"fileSize": 1945,
"success": true,
"message": "Resource extraction completed successfully",
"docName": "document.pdf"
}

Use Cases

Content Management Systems: Extract text and images from PDF documents for centralized content repositories, enabling searchable databases of document content.

Data Processing Pipelines: Automatically extract structured data from PDF reports, invoices, and forms for integration with business intelligence tools and analytics platforms.

Document Analysis: Process large volumes of PDF documents to extract specific information for compliance monitoring, research, and automated reporting.

Archive Digitization: Convert legacy PDF archives into searchable, structured data for modern content management and knowledge discovery systems.

Get Help