Extract Resources - Text & Image Extractor API
PDF4me Extract Resources enables you to extract text content and embedded images from PDF documents. This API service processes PDF files and extracts text and image resources from PDF documents. The API receives PDF content through REST API calls, utilizing Base64 encoding for secure transmission. With support for selective extraction of text and images, this solution is ideal for content processing workflows and data extraction platforms.
Authenticating Your API Request
To access the PDF4me REST API, every request must include proper authentication credentials. Authentication ensures secure communication and validates your identity as an authorized user of the REST API.
Key Features
- Text Extraction: Extract all text content from PDF documents
- Image Extraction: Retrieve all images and visual elements from PDF documents
- Selective Extraction: Choose to extract text only, images only, or both based on your requirements
- Base64 Encoding: Secure file content transmission using Base64 encoding
- Simple API Integration: RESTful API designed for automated content extraction workflows
REST API Endpoint
The PDF4me REST API uses standard HTTP methods to interact with resources. All resource extraction operations are performed through a single endpoint:
- Method: POST
- Endpoint:
/api/v2/ExtractResources
REST API Parameters
Complete list of parameters for the Extract Resources REST API. Parameters are organized by category for better understanding and implementation.
Important: Parameters marked with an asterisk (*) are required and must be provided for the API to function correctly.
Required Parameters
| Parameter | Type | Description | Example |
|---|---|---|---|
| docContent* | Base64 (String) | The content of the input PDF file encoded in Base64 format for resource extraction and content analysis processing | JVBERi... |
| docName* | String | Source PDF file name with proper .pdf extension for document identification and resource extraction processing | sample.pdf |
| extractText* | Boolean | Choose whether to extract text content from the PDF: true – Extract all text content with formatting, false – Skip text extraction for image-only processing | true |
| extractImages* | Boolean | Choose whether to extract images from the PDF: true – Extract images and visual elements, false – Skip image extraction for faster text-only processing | true |
Optional Parameters
| Parameter | Type | Description | Example |
|---|---|---|---|
| async | Boolean | Enable asynchronous processing. When true, the API returns 202 Accepted with a Location header for polling the result | true |
Output
The PDF4me Extract Resources REST API returns different responses based on the processing mode. The API returns extracted text and images as a JSON response.
- Success Response
- Asynchronous Processing
- Error Responses
- Response Format Details
Synchronous Processing (Default)
When async is false or not provided, the API returns the extracted resources immediately.
Status Code: 200 OK
Response Format:
{
"textList": ["extracted text content...", "more text..."],
"imageList": [
{
"fileName": "image1.png",
"imageContent": "base64-encoded-image-content..."
},
{
"fileName": "image2.jpg",
"imageContent": "base64-encoded-image-content..."
}
]
}
The response contains extracted text as an array of strings and extracted images as an array of objects with file names and Base64-encoded content.
Asynchronous Processing
When async is true, the API processes the document asynchronously.
Initial Response:
Status Code: 202 Accepted
Response Headers:
Location: https://api.pdf4me.com/api/v2/ExtractResourcesStatus/{operationId}
Response Body:
{
"traceId": "operation-trace-id"
}
Polling for Results:
Use the Location header URL to poll for completion:
const response = await fetch(locationUrl, {
headers: { 'Authorization': 'Basic ' + apiKey }
});
// Continue polling until status code is 200
if (response.status === 200) {
const result = await response.json();
// Process extracted resources
}
Error Responses
| Status Code | Description | Example Response |
|---|---|---|
| 400 Bad Request | Invalid request parameters or missing required fields | {"error": "Missing required parameter: docContent"} |
| 401 Unauthorized | Invalid or missing API key | {"error": "Unauthorized"} |
| 408 Request Timeout | Request processing timeout | {"error": "Request timeout"} |
| 500 Internal Server Error | Server error during processing | {"error": "Internal server error"} |
Understanding the JSON Response
The resource extraction response is a JSON object containing:
- textList: Array of strings containing extracted text content (if
extractTextistrue) - imageList: Array of image objects (if
extractImagesistrue)- fileName: The name of the extracted image file
- imageContent: Base64-encoded content of the image
Decoding Base64 Image Content:
To decode and save Base64-encoded images:
// JavaScript
const decoded = atob(base64String);
const blob = new Blob([decoded], { type: 'image/png' });
const url = URL.createObjectURL(blob);
# Python
import base64
image_data = base64.b64decode(base64_string)
with open('extracted_image.png', 'wb') as f:
f.write(image_data)
Request Example
Header
Content-Type: application/json
Authorization: Basic YOUR_BASE64_ENCODED_API_KEY
Note:
- Get your API key from the PDF4me Dashboard
- The API key must be Base64 encoded and prefixed with "Basic " in the Authorization header
- Example: If your API key is
abc123, encode it to Base64 and useAuthorization: Basic YWJjMTIz
Payload
Basic Request:
{
"docContent": "JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCjIgMCBvYmoKPDwKL1R5cGUgL1BhZ2VzCi9LaWRzIFszIDAgUl0KL0NvdW50IDEKPD4KZW5kb2JqCjMgMCBvYmoKPDwKL1R5cGUgL1BhZ2UKL1BhcmVudCAyIDAgUgovTWVkaWFCb3ggWzAgMCA2MTIgNzkyXQovUmVzb3VyY2VzIDw8Ci9Gb250IDw8Ci9GMSA0IDAgUgo+Pgo+PgovQ29udGVudHMgNSAwIFIKPj4KZW5kb2JqCjQgMCBvYmoKPDwKL1R5cGUgL0ZvbnQKL1N1YnR5cGUgL1R5cGUxCi9CYXNlRm9udCAvSGVsdmV0aWNhCj4+CmVuZG9iago1IDAgb2JqCjw8Ci9MZW5ndGggNDQKPj4Kc3RyZWFtCkJUCi9GMSAxMiBUZgoxMDAgNzAwIFRkCihIZWxsbyBXb3JsZCkgVGoKRVQKZW5kc3RyZWFtCmVuZG9iagp4cmVmCjAgNgowMDAwMDAwMDAwIDY1NTM1IGYgCjAwMDAwMDAwMDkgMDAwMDAgbiAKMDAwMDAwMDA1NCAwMDAwMCBuIAowMDAwMDAwMTAxIDAwMDAwIG4gCjAwMDAwMDAxNzAgMDAwMDAgbiAKMDAwMDAwMDI0NCAwMDAwMCBuIAp0cmFpbGVyCjw8Ci9TaXplIDYKL1Jvb3QgMSAwIFIKPj4Kc3RhcnR4cmVmCjM0MQolJUVPRg==",
"docName": "sample.pdf",
"extractText": true,
"extractImages": true
}
With Asynchronous Processing:
{
"docContent": "JVBERi0xLjQKJeLjz9MK...",
"docName": "sample.pdf",
"extractText": true,
"extractImages": true,
"async": true
}
Code Samples
The PDF4me Extract Resources REST API provides code samples in multiple programming languages. Choose the language that best fits your development environment:
- C#
- Java
- JavaScript
- Python
- Salesforce
- n8n
- Google Script
- AWS Lambda
Google Script Sample
Google Apps Script implementation for Google Workspace integration:
Image Extraction Features
- High-quality Images: Extract images with original resolution and quality
- Multiple Formats: Support for various image formats (JPG, PNG, GIF, etc.)
- Vector Graphics: Extract scalable vector graphics and charts
- Metadata Preservation: Maintain image properties and metadata
Advanced Processing
- Selective Extraction: Choose between text-only, image-only, or combined extraction
- Batch Processing: Process multiple PDFs and extract resources efficiently
- Content Analysis: Analyze and categorize extracted content types
- Professional Results: High-quality extraction suitable for enterprise applications
Industry Use Cases & Applications
- Document Management & Processing
- Business & Finance
- Legal & Compliance
- Healthcare & Medical
- Education & Research
- Marketing & Content
Document Management & Processing Use Cases
- Content Digitization: Extract text and images from scanned documents for digital archives
- Document Analysis: Analyze PDF content for information extraction and processing
- Data Migration: Extract content during document migration and system updates
- Content Indexing: Create searchable indexes from PDF text and image content
Business & Finance Use Cases
- Invoice Processing: Extract text and images from invoices for automated processing
- Report Analysis: Extract data and charts from financial reports and statements
- Contract Management: Extract text and visual elements from contracts and agreements
- Compliance Documentation: Process regulatory documents and extract required information
Legal & Compliance Use Cases
- Legal Document Processing: Extract text and evidence from legal documents
- Case Management: Extract content from case files and legal briefs
- Regulatory Compliance: Process compliance documents and extract required data
- Evidence Collection: Extract text and images for legal evidence and documentation
Healthcare & Medical Use Cases
- Medical Records: Extract text and images from patient records and medical reports
- Research Data: Process research documents and extract data and charts
- Clinical Trials: Extract information from clinical study documents
- Medical Imaging: Extract medical images and diagnostic content from PDFs
Education & Research Use Cases
- Academic Papers: Extract text and figures from research papers and publications
- Educational Content: Process textbooks and educational materials
- Research Data: Extract data and charts from research documents
- Library Digitization: Digitize library collections and extract searchable content
Marketing & Content Use Cases
- Content Creation: Extract text and images for content marketing and social media
- Brand Assets: Extract logos and branding elements from PDF documents
- Design Resources: Extract design elements and graphics for creative projects
- Content Repurposing: Extract content for reuse in different formats and platforms