Extract Text by Expression
Extract → Extract Text by Expression
The Extract Text by Expression API extracts text from a PDF that matches a regular expression. You send the PDF as Base64 (docContent), docName, expression (regex pattern), pageSequence (e.g. 1-, 1-3, 1,2,3), and optionally async. The API returns JSON with all matches. Use the tester below to try it; more details and expression examples are in the sections that follow.
Try the Extract Text by Expression API
:::note Quick reference
Endpoint: POST /api/v2/ExtractTextByExpression · Required: api-key, docContent, docName, expression, pageSequence
:::
:::info Try it live
Use the form below to send your API key, PDF (Base64), and a regex pattern (e.g. %, \d+, email pattern). The response is JSON with all text matches. No code required—fill the fields and click Send request.
:::
Overview, parameters, and use cases
- Overview
- Parameters
- Use cases
What is Extract Text by Expression?
This endpoint extracts text from a PDF that matches a regular expression. You provide the PDF (Base64), expression (regex pattern), pageSequence (e.g. 1- for all pages, 1-3 for a range, 1,2,3 for specific pages), and optionally async. The API returns JSON with all text matches. Use it to pull emails, numbers, dates, URLs, or any pattern from PDFs.
Key features
- Regex patterns – expression: e.g.
%,\d+, email pattern, date pattern, currency. - Page targeting – pageSequence:
1-(all),1-3(range),1,2,3(specific pages). - JSON response – All matches for the specified pattern.
- Async – Use async for large PDFs or many matches.
:::tip Best for Use when you need specific data (emails, numbers, dates, amounts) from PDFs. For full text or images use Extract Resources; for tables use Extract Table from PDF. :::
API parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| api-key | string | Yes | Your PDF4me API key, Base64 encoded. Get it from the dashboard. |
| docContent | base64 | Yes | PDF file content (Base64). |
| docName | string | Yes | PDF file name with .pdf extension. |
| expression | string | Yes | Regular expression pattern (e.g. %, \d+, email pattern). |
| pageSequence | string | Yes | Page range: 1- (all), 1,2,3, 1-5, etc. |
| async | boolean | No | Enable asynchronous processing. |
When to use Extract Text by Expression
- Emails and URLs – Extract emails, phone numbers, or links from PDFs.
- Numbers and amounts – Extract percentages, currency amounts, or IDs.
- Dates – Extract dates in various formats (MM/DD/YYYY, ISO, etc.).
- Structured data – Pull specific patterns (e.g. invoice numbers, SSNs) for workflows.
:::info Need the full API? For request/response schemas and code samples, see Extract Text by Expression in the PDF4me API docs. :::
Prerequisites
Before using this endpoint, make sure you have:
- A valid PDF4me API key (Get your API Key)
- A PDF document in Base64 format or a public URL to a PDF file
- A regular expression pattern to search for
Expression Examples
- Basic Patterns
- Currency & Percentages
- Contact Information
- Dates
- Text Patterns
- Advanced
- Quick Test Examples
Common basic regular expression patterns:
%- Match percentage symbol (e.g., 50%, 100%)\d+- Match one or more digits (e.g., 123, 4567)\d+\.\d+- Match decimal numbers (e.g., 3.14, 99.99)[A-Za-z]+- Match one or more letters\d{4}- Match exactly 4 digits (e.g., years like 2024)
Patterns for extracting financial data:
\d+%- Match percentages with numbers (e.g., 50%, 100%)\$[\d,]+\.?\d*- Match US dollar amounts (e.g., $100, $1,234.56)€[\d,]+\.?\d*- Match Euro amounts (e.g., €50, €1,000.00)£[\d,]+\.?\d*- Match British Pound amounts (e.g., £50, £1,000.00)
Extract contact details from documents:
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}- Match email addresses (e.g., [email protected])\+?[\d\s\-\(\)]{10,}- Match phone numbers (various formats)https?://[^\s]+- Match URLs (e.g., https://example.com)
Extract dates in various formats:
\d{1,2}/\d{1,2}/\d{2,4}- Match dates like 12/25/2024 or 01/05/24\d{4}-\d{2}-\d{2}- Match ISO dates (e.g., 2024-12-25)[A-Z]{2,3}\s+\d{1,2}\s+\d{4}- Match dates like "Dec 25 2024"
Extract specific text patterns:
PDF|pdf- Match the word "PDF" (case-insensitive pattern)Chapter\s+\d+- Match "Chapter" followed by a number (e.g., Chapter 1)[A-Z][a-z]+\s+[A-Z][a-z]+- Match capitalized names (e.g., John Doe)
Complex patterns for specific use cases:
\b\d{3}-\d{2}-\d{4}\b- Match SSN format (e.g., 123-45-6789)\b\d{4}\s\d{4}\s\d{4}\s\d{4}\b- Match credit card numbers (16 digits with spaces)[A-Z]{2}\d{2}\s?\d{4}\s?\d{4}\s?\d{4}\s?\d{4}\s?\d{4}\s?\d{4}- Match IBAN format
Recommended expressions to test with your sample (2).pdf file:
%- Find all percentage symbols (start simple)\d+- Extract all numbers in the document[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}- Extract email addresseshttps?://[^\s]+- Extract web links\d{1,2}/\d{1,2}/\d{2,4}- Extract dates\$[\d,]+\.?\d*- Extract dollar amounts
Page Sequence Examples:
1-- All pages from page 1 to end1-3- Pages 1, 2, and 31,2,3- Specific pages 1, 2, and 3all- All pages (if supported)
Response Format
The API returns a JSON response with all text matches found for the specified regular expression pattern.