Skip to main content

How to Extract Text from Scanned PDFs in Zapier: Watch → OCR → Extract (3 Steps)

· 10 min read
SEO and Content Writer

Scanned invoices, faxed forms, or screenshots saved as PDFs—they look like documents, but the text is just pixels. You can’t select it, copy it, or feed it into your Zap. The workaround: run OCR first to make the PDF searchable, then extract the text. Here’s a three-step Zap that does exactly that, using Dropbox and PDF4me.

In a nutshell: Drop a PDF into Dropbox → Zap triggers → OCR makes it searchable → Extract Text pulls the content into full Text. Map that field into Sheets, Airtable, email, or anything else.

What You’ll Get at the End

When a PDF lands in your Dropbox folder, the Zap runs OCR (if needed) and extracts all text into a single field—full Text. You can map that into Google Sheets, Airtable, email, or any next step. The output includes Trace Id for debugging and Page Text Info for per-page details.

Extract Text From PDF output: full Text with Sample PDF Document and lines 1–10, Trace Id, Page Text Info

Output from the Zap: extracted text, Trace Id, and Page Text Info.


Why Image-Based PDFs Need OCR First

Image-based PDFs store text as pixels. Extraction tools expect a text layer, so they return nothing useful. Make PDF Searchable / OCR adds that layer—then Extract Text From PDF can read everything. Skip the OCR step only if your PDFs already have selectable text (try copying a word in a reader to verify).


What Do You Need?


The Zap in 3 Steps

  1. New File in Folder (Dropbox) — Watches /pdf4metest/ExtractText every 2 minutes.
  2. Make PDF Searchable / OCR (PDF4me) — Turns image-based PDFs into searchable documents.
  3. Extract Text From PDF (PDF4me) — Pulls the text into full Text, Trace Id, and Page Text Info.
Zapier workflow: Dropbox New File in Folder (2 min) → PDF4me Make PDF Searchable / OCR → PDF4me Extract Text From PDF

Input: An image-based PDF (e.g. Image To PDF.pdf) with content like “Sample PDF Document” and numbered lines. Output: All text in full Text, ready for the next step.

Sample PDF Document: image-based input with 10 numbered lines

Example input: image-based PDF that needs OCR before extraction.


Step 1: New File in Folder (Trigger).

Zap so far: Trigger only.

  1. Add DropboxNew File in Folder as the trigger.
  2. SpaceDefault (or choose your space).
  3. Folder */pdf4metest/ExtractText (or your target folder).
  4. Include files in subfolders?False (or True for nested folders).
  5. Include file contents?Yes. Must be Yes so the file content is passed to the next steps.
  6. Include sharing link?Yes (optional).
  7. Click Continue and test the trigger.
New File in Folder: Space Default, Folder /pdf4metest/ExtractText, Include file contents Yes, Include sharing link Yes

Important: Set Include file contents? to Yes. If it’s No, Zapier passes a reference instead of the file, and the OCR and Extract steps will fail.


Step 2: Make PDF Searchable / OCR (Action).

Zap so far: Trigger → Make PDF Searchable / OCR.

  1. Add PDF4meMake PDF Searchable / OCR.
  2. File * — Map 1. File from the Dropbox trigger. Use the field with the actual file content, not “File: (Exists but not shown).”
  3. File Name — Map 1. File Name and 1. File Ext (e.g. Image To PDF + .pdf).
  4. Quality Type *Standard for most PDFs.
  5. Set Process as AsyncFalse so the Zap waits for OCR to finish.
  6. Click Continue and test.
Make PDF Searchable / OCR: File from step 1, File Name Image To PDF.pdf, Quality Type Standard, Set Process as Async False

The OCR step returns a searchable PDF. The Extract step uses that file.


Step 3: Extract Text From PDF (Action).

Zap so far: Trigger → OCR → Extract Text.

  1. Add PDF4meExtract Text From PDF.
  2. File * — Map 2. File Url from the OCR step (the processed PDF).
  3. File Name — Map 2. Full File Name (e.g. Image To PDF.pdf).
  4. Extract Mode *Full Document for all text in one field, or Page-Wise for per-page output.
  5. Click Continue and test.
Extract Text From PDF: File Url and Full File Name from OCR step, Extract Mode Full Document

Output: The action returns full Text (all extracted text), Trace Id (e.g. 5e4ba591-94c4-4b6c-ac3f-ca062d483981), and Page Text Info. For a sample PDF like the one above, you’ll see “Sample PDF Document” plus lines 1 through 10 in full Text.


Quick Reference Table

StepKey settingsNotes
TriggerFolder: /pdf4metest/ExtractTextInclude file contents: Yes
OCRQuality Type: StandardUse Expert for poor scans (more API calls)
OCRSet Process as Async: FalseKeeps the Zap synchronous
ExtractExtract Mode: Full DocumentAll text in one field
Outputfull TextMap this to later steps

For full parameter details, see Extract Text from PDF — Zapier and PDF OCR — Zapier.


Troubleshooting.

"File: (Exists but not shown)"

That option often passes a reference instead of the file. Pick the field that contains the actual file content. See Zapier and Power Automate Tips for file handling.

"File is Empty" or no text extracted

Ensure Include file contents? is Yes in the trigger, and that you map the OCR step’s File Url (or file output) into the Extract step, not a path or metadata.

401, 402, or other API errors

PDF4me Troubleshooting covers 401 (API key), 402 (credits), and more.


Try the API Yourself.

Test the Extract Text API without building a Zap:


Next Steps.

You’re set.
  • Drop a test PDF into your folder and run the Zap. Check the Extract step output for full Text.
  • Add a step after Extract—e.g. Add Row to Google Sheets, Send Email, or Update Airtable—and map full Text into it.
  • Turn the Zap on so it runs when new PDFs appear in the folder.