Skip to main content

Extract Text by Expression - RegEx Search API

PDF4me Extract Text by Expression enables you to extract specific text from PDF documents using regular expressions. This API service processes PDF files and extracts text matching specific patterns/expressions from PDF documents. The API receives PDF content and regular expression patterns through REST API calls, utilizing Base64 encoding for secure transmission. With support for complex regular expressions and flexible page targeting, this solution is ideal for document processing and data extraction workflows.

Authenticating Your API Request

To access the PDF4me REST API, every request must include proper authentication credentials. Authentication ensures secure communication and validates your identity as an authorized user of the REST API.

Key Features

  • Regular Expression Support: Extract text using regex patterns for precise text matching
  • Flexible Page Targeting: Process specific pages or entire documents with custom page sequences
  • Pattern Matching: Support for complex regular expressions and pattern recognition
  • Base64 Encoding: Secure file content transmission using Base64 encoding
  • Simple API Integration: RESTful API designed for automated text extraction workflows

REST API Endpoint

The PDF4me REST API uses standard HTTP methods to interact with resources. All text extraction by expression operations are performed through a single endpoint:

  • Method: POST
  • Endpoint: /api/v2/ExtractTextByExpression

REST API Parameters

Complete list of parameters for the Extract Text by Expression REST API. Parameters are organized by category for better understanding and implementation.

Important: Parameters marked with an asterisk (*) are required and must be provided for the API to function correctly.

Required Parameters

ParameterTypeDescriptionExample
docContent*Base64 (String)The content of the input PDF file in Base64 formatJVBERi...
docName*StringSource PDF file name with proper file extensionoutput.pdf
expression*StringRegular expression pattern for text extraction. Supports standard regex syntax including groups, quantifiers, and anchors% or [A-Za-z]+
pageSequence*StringSpecify which pages to process. Use "1-" for all pages, "1-3" for range, or "1,2,3" for specific pages1-3

Optional Parameters

ParameterTypeDescriptionExample
asyncBooleanEnable asynchronous processing. When true, the API returns 202 Accepted with a Location header for polling the resulttrue

Output

The PDF4me Extract Text by Expression REST API returns different responses based on the processing mode. The API returns extracted text matches as a JSON response.

Synchronous Processing (Default)

When async is false or not provided, the API returns the extracted text matches immediately.

Status Code: 200 OK

Response Format:

{
"textList": ["extracted text 1", "extracted text 2", "extracted text 3"]
}

The response contains an array of text strings matching the specified regular expression pattern.

Request Example

Content-Type: application/json
Authorization: Basic YOUR_BASE64_ENCODED_API_KEY

Note:

  • Get your API key from the PDF4me Dashboard
  • The API key must be Base64 encoded and prefixed with "Basic " in the Authorization header
  • Example: If your API key is abc123, encode it to Base64 and use Authorization: Basic YWJjMTIz

Payload

Basic Request:

{
"docContent": "JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCjIgMCBvYmoKPDwKL1R5cGUgL1BhZ2VzCi9LaWRzIFszIDAgUl0KL0NvdW50IDEKPD4KZW5kb2JqCjMgMCBvYmoKPDwKL1R5cGUgL1BhZ2UKL1BhcmVudCAyIDAgUgovTWVkaWFCb3ggWzAgMCA2MTIgNzkyXQovUmVzb3VyY2VzIDw8Ci9Gb250IDw8Ci9GMSA0IDAgUgo+Pgo+PgovQ29udGVudHMgNSAwIFIKPj4KZW5kb2JqCjQgMCBvYmoKPDwKL1R5cGUgL0ZvbnQKL1N1YnR5cGUgL1R5cGUxCi9CYXNlRm9udCAvSGVsdmV0aWNhCj4+CmVuZG9iago1IDAgb2JqCjw8Ci9MZW5ndGggNDQKPj4Kc3RyZWFtCkJUCi9GMSAxMiBUZgoxMDAgNzAwIFRkCihIZWxsbyBXb3JsZCkgVGoKRVQKZW5kc3RyZWFtCmVuZG9iagp4cmVmCjAgNgowMDAwMDAwMDAwIDY1NTM1IGYgCjAwMDAwMDAwMDkgMDAwMDAgbiAKMDAwMDAwMDA1NCAwMDAwMCBuIAowMDAwMDAwMTAxIDAwMDAwIG4gCjAwMDAwMDAxNzAgMDAwMDAgbiAKMDAwMDAwMDI0NCAwMDAwMCBuIAp0cmFpbGVyCjw8Ci9TaXplIDYKL1Jvb3QgMSAwIFIKPj4Kc3RhcnR4cmVmCjM0MQolJUVPRg==",
"docName": "output.pdf",
"expression": "%",
"pageSequence": "1-3"
}

With Asynchronous Processing:

{
"docContent": "JVBERi0xLjQKJeLjz9MK...",
"docName": "output.pdf",
"expression": "\\d+",
"pageSequence": "1-3",
"async": true
}

Code Samples

The PDF4me Extract Text by Expression REST API provides code samples in multiple programming languages. Choose the language that best fits your development environment:

C# (CSharp) Sample

Complete C# implementation for extracting text by expression from PDF:

Text Extraction Features

Regular Expression Processing

  • Pattern Matching: Full support for standard regular expression syntax and patterns
  • Complex Patterns: Support for advanced regex features including groups, quantifiers, and anchors
  • Custom Expressions: Flexible pattern creation for specific text extraction requirements
  • Pattern Validation: Built-in validation for regex pattern syntax and compatibility
  • Professional Results: Reliable text extraction with accurate pattern matching

Page Processing

  • Page Targeting: Extract text from specific pages or entire documents
  • Page Sequences: Support for custom page ranges and individual page selection
  • Flexible Processing: Process any combination of pages with precise control
  • Batch Processing: Handle multiple pages efficiently in single API calls
  • Professional Layout: Consistent text extraction across all target pages

Advanced Features

  • Text Analysis: Comprehensive text pattern analysis and identification
  • Custom Filtering: Advanced filtering options for specific text extraction needs
  • Professional Extraction: High-quality text extraction with clear visibility
  • Flexible Patterns: Support for any regular expression pattern and text matching requirements

Industry Use Cases & Applications

Legal & Professional Services Use Cases

  • Document Analysis: Extract specific data patterns from contracts, invoices, and legal documents
  • Contract Analysis: Extract key terms and data patterns from legal contracts
  • Compliance Monitoring: Extract regulatory information and compliance data from documents
  • Legal Data Extraction: Extract specific data patterns from legal documents

Get Help