Skip to main content

PDF OCR - Scanned Document to Text Converter

PDF4me Convert PDF to Editable PDF using OCR enables you to convert image-based PDF documents to fully editable PDFs using OCR technology. This API service processes scanned PDFs and image-based documents, extracting text and making documents searchable and editable. The API receives PDF content and OCR parameters through REST API calls, utilizing Base64 encoding for secure transmission. With support for quality control, intelligent OCR processing, and language specification, this solution is ideal for document digitization and content accessibility workflows.

Authenticating Your API Request

To access the PDF4me REST API, every request must include proper authentication credentials. Authentication ensures secure communication and validates your identity as an authorized user of the REST API.

Key Features

  • OCR Processing: Convert image-based PDFs to fully editable documents with text recognition
  • Quality Control: Choose between Draft (1 API call per file) and High (2 API calls per page) quality processing
  • Intelligent Processing: Skip OCR when text is already searchable to optimize performance
  • Language Support: Specify document language for improved text recognition accuracy
  • Base64 Encoding: Secure file content transmission using Base64 encoding
  • Simple API Integration: RESTful API designed for automated PDF OCR workflows

REST API Endpoint

The PDF4me REST API uses standard HTTP methods to interact with resources. All OCR-based PDF conversion operations are performed through a single endpoint:

  • Method: POST
  • Endpoint: /api/v2/ConvertOcrPdf

REST API Parameters

Complete list of parameters for the Convert PDF to editable PDF using OCR REST API. Parameters are organized by category for better understanding and implementation.

Important: Parameters marked with an asterisk (*) are required and must be provided for the API to function correctly.

Required Parameters

ParameterTypeDescriptionExample
docContent*Base64The content of the input PDF file encoded in Base64 formatJVBERi...
docName*StringSource PDF file name with .pdf extensionoutput.pdf
qualityType*StringOCR quality type: Draft (1 API call per file) for normal PDFs, High (2 API calls per page) for scanned documentsDraft
ocrWhenNeeded*StringOCR only when needed: true to skip if text is already searchable, false to always perform OCRtrue
outputFormat*StringOutput format (must be "true" for standard output)true
isAsync*BooleanEnable asynchronous processing. When true, the API returns a 202 status and provides a polling URL in the Location headertrue

Optional Parameters

ParameterTypeDescriptionExample
languageStringLanguage of text in source file (e.g., "English", "Spanish", "French", "German"). Only use if output is not recognizableEnglish
mergeAllSheetsBooleanMerge all sheets if applicabletrue

Output

The PDF4me Convert PDF to Editable PDF using OCR REST API returns different responses based on the processing mode. The API returns the converted PDF as a Base64-encoded string in JSON format.

Synchronous Processing (Default)

When isAsync is false, the API processes the request immediately:

Status Code: 200 OK

Content-Type: application/json

Response Body:

{
"docName": "output.pdf",
"docContent": "JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCjIgMCBvYmoKPDwKL1R5cGUgL1BhZ2VzCi9LaWRzIFszIDAgUl0KL0NvdW50IDEKPD4KZW5kb2JqCjMgMCBvYmoKPDwKL1R5cGUgL1BhZ2UKL1BhcmVudCAyIDAgUgovTWVkaWFCb3ggWzAgMCA2MTIgNzkyXQovUmVzb3VyY2VzIDw8Ci9Gb250IDw8Ci9GMSA0IDAgUgo+Pgo+PgovQ29udGVudHMgNSAwIFIKPj4KZW5kb2JqCjQgMCBvYmoKPDwKL1R5cGUgL0ZvbnQKL1N1YnR5cGUgL1R5cGUxCi9CYXNlRm9udCAvSGVsdmV0aWNhCj4+CmVuZG9iago1IDAgb2JqCjw8Ci9MZW5ndGggNDQKPj4Kc3RyZWFtCkJUCi9GMSAxMiBUZgoxMDAgNzAwIFRkCihIZWxsbyBXb3JsZCkgVGoKRVQKZW5kc3RyZWFtCmVuZG9iagp4cmVmCjAgNgowMDAwMDAwMDAwIDY1NTM1IGYgCjAwMDAwMDAwMDkgMDAwMDAgbiAKMDAwMDAwMDA1NCAwMDAwMCBuIAowMDAwMDAwMTAxIDAwMDAwIG4gCjAwMDAwMDAxNzAgMDAwMDAgbiAKMDAwMDAwMDI0NCAwMDAwMCBuIAp0cmFpbGVyCjw8Ci9TaXplIDYKL1Jvb3QgMSAwIFIKPj4Kc3RhcnR4cmVmCjM0MQolJUVPRg=="
}

Response Fields:

  • docName (string): The output PDF file name
  • docContent (string): The converted editable PDF, encoded as Base64 string

How to Use:

  1. Extract the docContent field from the JSON response
  2. Decode the Base64 string to get the binary PDF data
  3. Save or process the PDF file as needed

Example (JavaScript):

const response = await fetch(url, options);
const data = await response.json();
const pdfBytes = atob(data.docContent); // Decode Base64
// Save or process pdfBytes

Request Example

Content-Type: application/json
Authorization: Basic YOUR_BASE64_ENCODED_API_KEY

Note: The API key must be Base64 encoded. Get your API key from the PDF4me Dashboard

Payload

{
"docContent": "JVBERi...",
"docName": "output.pdf",
"qualityType": "Draft",
"ocrWhenNeeded": "true",
"language": "English",
"outputFormat": "true",
"isAsync": true
}

Code Samples

The PDF4me Convert PDF to editable PDF using OCR REST API provides code samples in multiple programming languages. Choose the language that best fits your development environment:

C# (CSharp) Sample

Complete C# implementation for converting PDF to editable PDF using OCR:

OCR Conversion Features

Text Recognition Processing

  • Advanced OCR Engine: Advanced optical character recognition for accurate text extraction
  • Quality Options: Draft quality for normal PDFs, High quality for scanned documents
  • Intelligent Detection: Automatic detection of searchable vs. image-based content
  • Professional Results: High-quality text recognition with accurate character extraction
  • Advanced Processing: Support for complex document layouts and formatting

Language Support

  • Multi-language Recognition: Support for various languages and character sets
  • Language Specification: Optional language parameter for improved accuracy
  • Character Recognition: Accurate recognition of different alphabets and symbols
  • Professional Processing: High-quality language-specific text recognition
  • Flexible Input: Support for documents in multiple languages

Advanced Features

  • Smart Processing: Intelligent OCR processing that skips unnecessary operations
  • Format Flexibility: Support for various PDF formats and document types
  • Professional Conversion: High-quality document conversion with clear text output
  • Flexible Options: Customizable processing parameters for specific requirements

Industry Use Cases & Applications

Legal & Professional Services Use Cases

  • Legal Document Processing: Convert scanned legal documents to searchable, editable formats
  • Legal Archive Processing: Digitize historical legal documents and archives
  • Compliance Documentation: Convert regulatory documents to searchable formats for compliance monitoring
  • Legal Document Accessibility: Make legal PDFs accessible for text search and editing

Get Help