Skip to main content

PDF OCR in Make

What this module does

PDF4me — PDF OCR converts scanned or image-based PDFs into fully searchable text documents inside a Make scenario. Choose Standard quality for born-digital PDFs or Expert quality for scanned, photographed, or low-resolution documents. Enable OCR Only When Needed to skip pages that already have selectable text, and use Is Async for files over 10 pages to prevent scenario timeouts. The output is a searchable PDF ready for text extraction, archival, full-text indexing, or AI processing.

Authenticating Your API Request

Every PDF4me module in Make requires a valid Connection. Create or select one that holds your PDF4me API key so the scenario can authenticate OCR processing requests securely.

Important Facts You Should Not Miss

Expert vs Standard Quality

Use Expert for scanned or photographed documents — it applies two processing passes per page for higher accuracy. Use Standard for born-digital PDFs where OCR is still needed — it is faster and costs half as many API credits per page.

OCR Only When Needed saves credits

When enabled, the module checks each page before processing — pages that already contain selectable text are skipped and OCR is applied only to image-based pages. This saves API credits when your input PDF mixes native and scanned pages.

Use Is Async for files over 10 pages

Without async mode, large PDFs may cause the Make scenario to time out before OCR completes. Enable Is Async for any document with more than 10 pages to allow the module to return results asynchronously when processing finishes.

Make PDF4me PDF OCR module showing Files mapped from prior step, Output File Name, Quality Type set to Expert, OCR Only When Needed, Language Code, and Is Async fields

Files, Output File Name, and Quality Type are required. OCR Only When Needed, Language Code, and Is Async are optional but recommended for mixed-content and large PDFs.

Parameters

Required: Connection, Files, Output File Name, and Quality Type — all must be provided. OCR Only When Needed, Language Code, and Is Async are optional.

ParameterRequiredWhat it doesExample
ConnectionYesPDF4me API connection. Click Add and paste your API key if connecting for the first time.Your PDF4me connection
FilesYesBinary PDF file mapped from a preceding module — scanned, photographed, or image-based PDFs.1. Data
Output File NameYesFilename for the searchable PDF output including the .pdf extension.scanned_searchable.pdf
Quality TypeYesStandard for born-digital PDFs (1 API call per page, faster). Expert for scanned or image-based documents (2 API calls per page, higher accuracy).Expert
OCR Only When NeededNoWhen set to Yes, pages with existing selectable text are skipped — OCR is applied only to image-based pages. Saves API credits for mixed-content PDFs.Yes
Language CodeNoISO language code for the OCR engine. Common values: eng, deu, fra, spa, ita, por. Leave blank for automatic language detection.eng
Is AsyncNoSet to Yes for PDFs with more than 10 pages to prevent Make scenario timeout. Results are returned asynchronously when processing completes.Yes

Quick Setup

  1. Add PDF4mePDF OCR to your Make scenario.
  2. Select Connection (or click Add to create one with your API key).
  3. Map Files to the binary output of the preceding download module (e.g. Dropbox → Download a File → Data).
  4. Enter an Output File Name — e.g. scanned_searchable.pdf.
  5. Set Quality Type to Expert for scanned documents or Standard for born-digital PDFs.
  6. Optionally enable OCR Only When Needed and Is Async (for files over 10 pages), then click Save and run. The output Doc Data is your searchable PDF.

Workflow Examples

Workflow ExamplesCommon Make scenario patterns using PDF OCR.
Scanned invoice OCR and data extraction
  1. Gmail Watch triggers when a new email with a scanned invoice attachment arrives.
  2. PDF OCR runs with Quality Type set to Expert and OCR Only When Needed enabled.
  3. Extract Resources extracts the searchable text from the Doc Data output.
  4. The extracted text is parsed for invoice fields and posted to the accounting system automatically.
Legacy document digitisation batch
  1. Dropbox Watch triggers when a new scan lands in the incoming folder.
  2. PDF OCR processes the file with Expert quality and Is Async set to Yes for large batches.
  3. The searchable PDF is uploaded to the SharePoint archive folder.
  4. Slack notifies the team when each document is digitised and indexed.
Signed contract OCR archive with Airtable log
  1. OneDrive Watch triggers when a signed paper contract scan is uploaded to the contracts folder.
  2. PDF OCR converts the scanned contract to a searchable PDF with Expert quality.
  3. Google Drive saves the searchable PDF to the Contracts Archive folder.
  4. Airtable creates a new record with the contract name, date, and a link to the searchable PDF for compliance tracking.

Frequently Asked Questions

Which Quality Type should I choose?+
Use Expert for scanned documents, photographed pages, or low-quality images — it applies two processing passes per page for higher accuracy. Use Standard for born-digital PDFs (e.g. generated by software or saved from Word) where OCR is still required — it is faster and uses half as many API credits per page.
What does OCR Only When Needed do?+
When enabled, the module checks each page before processing. Pages that already contain selectable text are skipped — OCR is applied only to image-based pages. This is especially useful for PDFs that mix native digital pages with scanned inserts, since it avoids processing pages that do not need OCR.
Which languages does the OCR engine support?+
Common Language Code values are eng (English), deu (German), fra (French), spa (Spanish), ita (Italian), and por (Portuguese). Leave the Language Code field blank for automatic detection. Providing the correct code when the language is known improves accuracy.
When should I enable Is Async?+
Enable Is Async for any PDF with more than 10 pages. Without async mode, large multi-page documents may cause the Make scenario to time out before OCR processing finishes. With Is Async enabled, the module returns results asynchronously once processing completes regardless of file size.
What is in the Doc Data output?+
Doc Data contains the binary of the OCR-processed searchable PDF. The text layer is embedded inside the PDF — you can pass it to the Extract Resources module to extract the text, save it directly to cloud storage, or forward it to an AI analysis step for further processing.

Get Help