Skip to main content

Extract Text by Expression

ExtractExtract Text by Expression

The Extract Text by Expression API extracts text from a PDF that matches a regular expression. You send the PDF as Base64 (docContent), docName, expression (regex pattern), pageSequence (e.g. 1-, 1-3, 1,2,3), and optionally async. The API returns JSON with all matches. Use the tester below to try it; more details and expression examples are in the sections that follow.

Try the Extract Text by Expression API

:::note Quick reference Endpoint: POST /api/v2/ExtractTextByExpression · Required: api-key, docContent, docName, expression, pageSequence :::

:::info Try it live Use the form below to send your API key, PDF (Base64), and a regex pattern (e.g. %, \d+, email pattern). The response is JSON with all text matches. No code required—fill the fields and click Send request. :::

Loading API Tester...

Overview, parameters, and use cases

What is Extract Text by Expression?

This endpoint extracts text from a PDF that matches a regular expression. You provide the PDF (Base64), expression (regex pattern), pageSequence (e.g. 1- for all pages, 1-3 for a range, 1,2,3 for specific pages), and optionally async. The API returns JSON with all text matches. Use it to pull emails, numbers, dates, URLs, or any pattern from PDFs.

Key features

  • Regex patternsexpression: e.g. %, \d+, email pattern, date pattern, currency.
  • Page targetingpageSequence: 1- (all), 1-3 (range), 1,2,3 (specific pages).
  • JSON response – All matches for the specified pattern.
  • Async – Use async for large PDFs or many matches.

:::tip Best for Use when you need specific data (emails, numbers, dates, amounts) from PDFs. For full text or images use Extract Resources; for tables use Extract Table from PDF. :::

Prerequisites

Before using this endpoint, make sure you have:

  • A valid PDF4me API key (Get your API Key)
  • A PDF document in Base64 format or a public URL to a PDF file
  • A regular expression pattern to search for

Expression Examples

Common basic regular expression patterns:

  • % - Match percentage symbol (e.g., 50%, 100%)
  • \d+ - Match one or more digits (e.g., 123, 4567)
  • \d+\.\d+ - Match decimal numbers (e.g., 3.14, 99.99)
  • [A-Za-z]+ - Match one or more letters
  • \d{4} - Match exactly 4 digits (e.g., years like 2024)

Response Format

The API returns a JSON response with all text matches found for the specified regular expression pattern.

Get Help