Skip to main content

Extract Resources from PDF in Make

What this module does

PDF4me — Extract Resources pulls out all embedded images and text content from a PDF and returns them as structured data in your Make scenario. Toggle Extract Images to retrieve every image embedded in the document as a named, Base64-encoded file. Toggle Extract Text to retrieve the full text content as a plain string. Both can run simultaneously in a single module call. Use extracted images to build asset libraries, use extracted text for analysis, search indexing, or translation — all without opening the PDF manually.

Authenticating Your API Request

Every PDF4me module in Make requires a valid Connection. Create or select one that holds your PDF4me API key so the scenario can authenticate extraction requests securely.

Important Facts You Should Not Miss

Images array needs an Iterator to process individually
The Images output is an array — one item per embedded image. To upload or process each image separately, add a Make Iterator module immediately after Extract Resources and point it at the Images array. Each iteration gives you one image with a Name field (filename) and a Data field (Base64 content). Without an Iterator, you cannot access individual images downstream.
Image Data is Base64 — decode before uploading
Each image's Data field is Base64-encoded. Cloud storage modules (Dropbox, Google Drive, SharePoint) expect binary data, not Base64 text. Use Make's built-in toBinary(Data, 'base64') function to convert the Data field to binary before piping it into an upload module. The Name field gives you the original filename with extension (e.g. image_001.png).
Text extraction works on native PDFs — not scanned
The Extract Text toggle retrieves searchable text embedded in the PDF structure. Scanned PDFs are image-based — they contain no embedded text layer, so the Texts output will be empty or minimal on scanned documents. To extract text from scanned pages, run a PDF OCR module first to create a text layer, then use Extract Resources to pull the text. Image extraction works on all PDFs including scanned ones.
Make PDF4me Extract Resources module showing Connection set to My PDF4me connection, File Name mapped from step 1, Document mapped from step 1 data, Extract Images set to Yes, Extract Text set to Yes

Extract Images and Extract Text are independent toggles — enable one, both, or either depending on what you need. Images are returned as an array; Text is returned as a plain string.

Parameters

Required: Connection, File Name, Document, Extract Images, and Extract Text must all be provided. Set at least one of Extract Images or Extract Text to Yes — enabling neither returns no content.

ParameterRequiredWhat it doesExample
ConnectionYesPDF4me API connection. Click Add and paste your API key if connecting for the first time.My PDF4me connection
File NameYesFilename of the source PDF including .pdf extension. Map from the prior module's file name output.catalog.pdf
DocumentYesBinary content of the source PDF. Map from the prior module's data output — Dropbox, Google Drive, HTTP, or email attachment.1. Data
Extract ImagesYesToggle to extract all embedded images. Set to Yes to retrieve images as a named Base64 array. Set to No to skip image extraction. Can be combined with Extract Text.Yes
Extract TextYesToggle to extract all text content. Set to Yes to retrieve the full document text as a plain string. Set to No to skip text extraction. Works on native PDFs only — not scanned.Yes

Output Fields

FieldTypeDescription
TextsStringFull text content extracted from the PDF as a plain string. Empty if Extract Text is set to No or the PDF is scanned.
ImagesArrayArray of all embedded images. Each item has a Name (filename with extension) and Data (Base64-encoded image content). Empty array if Extract Images is No.

Quick Setup

  1. Add PDF4meExtract Resources to your Make scenario after a file download step.
  2. Select Connection (or click Add to create one with your API key).
  3. Map File Name and Document from the prior module.
  4. Set Extract Images and/or Extract Text to Yes depending on what you need.
  5. If extracting images, add an Iterator module pointing at the Images array. Inside the iterator, use toBinary(Data, 'base64') to decode each image and pipe it into an upload module.
  6. If extracting text, map the Texts field directly into a Google Sheets row, database insert, or HTTP POST body.

Workflow Examples

Workflow ExamplesCommon Make scenario patterns using Extract Resources.
Build an image library from product catalog PDFs
  1. Dropbox Watch Folder triggers when a new catalog PDF is uploaded.
  2. Dropbox Download a File retrieves the PDF binary.
  3. Extract Resources runs with Extract Images set to Yes, Extract Text set to No.
  4. Iterator loops through the Images array — one image per iteration.
  5. Each iteration decodes the Data field with toBinary(Data, 'base64') and uploads the image to a Google Drive "Product Images" folder using the Name field as the filename.
Index contract text for full-text search
  1. Google Drive Watch Files triggers when a new contract PDF is added to the "Contracts" folder.
  2. Google Drive Get a File downloads the binary.
  3. Extract Resources runs with Extract Text set to Yes, Extract Images set to No.
  4. The Texts output is written to an Airtable record alongside the filename and upload date.
  5. The contract is now full-text searchable in Airtable — no manual copy-paste needed.
Migrate legacy PDFs — text to database, images to media library
  1. A Google Sheets row with a PDF URL triggers the scenario for each legacy document in the migration list.
  2. An HTTP module downloads the PDF from the URL.
  3. Extract Resources runs with both Extract Text and Extract Images set to Yes.
  4. The Texts field is inserted into a MySQL database record for full-text search.
  5. An Iterator uploads each extracted image to a SharePoint media library, completing the content migration without ever opening the PDFs manually.

Frequently Asked Questions

How do I save each extracted image to cloud storage?+
Add a Make Iterator module after Extract Resources and point it at the Images array output. Each iteration yields one image object with a Name field (original filename with extension, e.g. image_001.png) and a Data field (Base64-encoded content). Convert Data to binary using Make's toBinary(Data, 'base64') function, then pipe the binary into a Dropbox Upload a File, Google Drive Upload a File, or SharePoint module — using the Name field as the destination filename.
Can I extract only images, only text, or both?+
The two toggles are fully independent. Set Extract Images to Yes and Extract Text to No to retrieve only images. Set Extract Text to Yes and Extract Images to No for text only. Enable both to retrieve everything in a single module call — PDF4me processes the document once and returns both outputs together, which is more efficient than running the module twice.
Will text extraction work on scanned PDFs?+
Scanned PDFs are essentially images of pages — they contain no embedded text layer. The Extract Text toggle retrieves text from the PDF's internal text structure, which only exists in digitally created (native) PDFs. For scanned documents, the Texts output will be empty or contain only minimal metadata. Run a PDF OCR module first to add a text layer to the scanned PDF, then use Extract Resources to pull the text.
What image formats are returned — PNG, JPG, or something else?+
The format depends on how each image is stored inside the PDF. Most PDFs embed images as JPEG or PNG. The Name field in each Images array item includes the original extension (e.g. image_001.jpg or image_002.png) so you know the format before saving. The Data field is always Base64 regardless of the underlying format.
Does the module modify or damage the source PDF?+
No. Extract Resources is a read-only operation. It reads the PDF content and returns copies of the embedded text and images — the source file in Dropbox, Google Drive, SharePoint, or any other location is completely untouched. You can run Extract Resources on the same PDF multiple times without any side effects.

Get Help