Skip to main content

Extract Resources - Text & Image Extractor API

PDF4me Extract Resources enables you to extract text content and embedded images from PDF documents. This API service processes PDF files and extracts text and image resources from PDF documents. The API receives PDF content through REST API calls, utilizing Base64 encoding for secure transmission. With support for selective extraction of text and images, this solution is ideal for content processing workflows and data extraction platforms.

Authenticating Your API Request

To access the PDF4me REST API, every request must include proper authentication credentials. Authentication ensures secure communication and validates your identity as an authorized user of the REST API.

Key Features

  • Text Extraction: Extract all text content from PDF documents
  • Image Extraction: Retrieve all images and visual elements from PDF documents
  • Selective Extraction: Choose to extract text only, images only, or both based on your requirements
  • Base64 Encoding: Secure file content transmission using Base64 encoding
  • Simple API Integration: RESTful API designed for automated content extraction workflows

REST API Endpoint

The PDF4me REST API uses standard HTTP methods to interact with resources. All resource extraction operations are performed through a single endpoint:

  • Method: POST
  • Endpoint: /api/v2/ExtractResources

REST API Parameters

Complete list of parameters for the Extract Resources REST API. Parameters are organized by category for better understanding and implementation.

Important: Parameters marked with an asterisk (*) are required and must be provided for the API to function correctly.

Required Parameters

ParameterTypeDescriptionExample
docContent*Base64 (String)The content of the input PDF file encoded in Base64 format for resource extraction and content analysis processingJVBERi...
docName*StringSource PDF file name with proper .pdf extension for document identification and resource extraction processingsample.pdf
extractText*BooleanChoose whether to extract text content from the PDF: true – Extract all text content with formatting, false – Skip text extraction for image-only processingtrue
extractImages*BooleanChoose whether to extract images from the PDF: true – Extract images and visual elements, false – Skip image extraction for faster text-only processingtrue

Optional Parameters

ParameterTypeDescriptionExample
asyncBooleanEnable asynchronous processing. When true, the API returns 202 Accepted with a Location header for polling the resulttrue

Output

The PDF4me Extract Resources REST API returns different responses based on the processing mode. The API returns extracted text and images as a JSON response.

Synchronous Processing (Default)

When async is false or not provided, the API returns the extracted resources immediately.

Status Code: 200 OK

Response Format:

{
"textList": ["extracted text content...", "more text..."],
"imageList": [
{
"fileName": "image1.png",
"imageContent": "base64-encoded-image-content..."
},
{
"fileName": "image2.jpg",
"imageContent": "base64-encoded-image-content..."
}
]
}

The response contains extracted text as an array of strings and extracted images as an array of objects with file names and Base64-encoded content.

Request Example

Content-Type: application/json
Authorization: Basic YOUR_BASE64_ENCODED_API_KEY

Note:

  • Get your API key from the PDF4me Dashboard
  • The API key must be Base64 encoded and prefixed with "Basic " in the Authorization header
  • Example: If your API key is abc123, encode it to Base64 and use Authorization: Basic YWJjMTIz

Payload

Basic Request:

{
"docContent": "JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCjIgMCBvYmoKPDwKL1R5cGUgL1BhZ2VzCi9LaWRzIFszIDAgUl0KL0NvdW50IDEKPD4KZW5kb2JqCjMgMCBvYmoKPDwKL1R5cGUgL1BhZ2UKL1BhcmVudCAyIDAgUgovTWVkaWFCb3ggWzAgMCA2MTIgNzkyXQovUmVzb3VyY2VzIDw8Ci9Gb250IDw8Ci9GMSA0IDAgUgo+Pgo+PgovQ29udGVudHMgNSAwIFIKPj4KZW5kb2JqCjQgMCBvYmoKPDwKL1R5cGUgL0ZvbnQKL1N1YnR5cGUgL1R5cGUxCi9CYXNlRm9udCAvSGVsdmV0aWNhCj4+CmVuZG9iago1IDAgb2JqCjw8Ci9MZW5ndGggNDQKPj4Kc3RyZWFtCkJUCi9GMSAxMiBUZgoxMDAgNzAwIFRkCihIZWxsbyBXb3JsZCkgVGoKRVQKZW5kc3RyZWFtCmVuZG9iagp4cmVmCjAgNgowMDAwMDAwMDAwIDY1NTM1IGYgCjAwMDAwMDAwMDkgMDAwMDAgbiAKMDAwMDAwMDA1NCAwMDAwMCBuIAowMDAwMDAwMTAxIDAwMDAwIG4gCjAwMDAwMDAxNzAgMDAwMDAgbiAKMDAwMDAwMDI0NCAwMDAwMCBuIAp0cmFpbGVyCjw8Ci9TaXplIDYKL1Jvb3QgMSAwIFIKPj4Kc3RhcnR4cmVmCjM0MQolJUVPRg==",
"docName": "sample.pdf",
"extractText": true,
"extractImages": true
}

With Asynchronous Processing:

{
"docContent": "JVBERi0xLjQKJeLjz9MK...",
"docName": "sample.pdf",
"extractText": true,
"extractImages": true,
"async": true
}

Code Samples

The PDF4me Extract Resources REST API provides code samples in multiple programming languages. Choose the language that best fits your development environment:

C# (CSharp) Sample

Complete C# implementation for extracting resources from PDF:

Image Extraction Features

  • High-quality Images: Extract images with original resolution and quality
  • Multiple Formats: Support for various image formats (JPG, PNG, GIF, etc.)
  • Vector Graphics: Extract scalable vector graphics and charts
  • Metadata Preservation: Maintain image properties and metadata

Advanced Processing

  • Selective Extraction: Choose between text-only, image-only, or combined extraction
  • Batch Processing: Process multiple PDFs and extract resources efficiently
  • Content Analysis: Analyze and categorize extracted content types
  • Professional Results: High-quality extraction suitable for enterprise applications

Industry Use Cases & Applications

Document Management & Processing Use Cases

  • Content Digitization: Extract text and images from scanned documents for digital archives
  • Document Analysis: Analyze PDF content for information extraction and processing
  • Data Migration: Extract content during document migration and system updates
  • Content Indexing: Create searchable indexes from PDF text and image content

Get Help