Extract Text from Word
Extract → Extract Text from Word
The Extract Text from Word API extracts text from a Word document (.doc, .docx). You send the document as Base64 (docContent), docName, StartPageNumber, EndPageNumber, RemoveComments, RemoveHeaderFooter, AcceptChanges, and optionally async. The API returns JSON or a text file with the extracted text. Use the tester below to try it; more details are in the sections that follow.
Try the Extract Text from Word API
:::note Quick reference
Endpoint: POST /api/v2/ExtractTextFromWord · Required: api-key, docContent, docName, StartPageNumber, EndPageNumber, RemoveComments, RemoveHeaderFooter, AcceptChanges
:::
:::info Try it live Use the form below to send your API key, Word document (Base64), page range, and options (remove comments, header/footer, accept tracked changes). The response is JSON or text with extracted content. No code required—fill the fields and click Send request. :::
Overview, parameters, and use cases
- Overview
- Parameters
- Use cases
What is Extract Text from Word?
This endpoint extracts text from a Word document (.doc, .docx). You specify a page range (StartPageNumber, EndPageNumber) and options: RemoveComments (strip comments), RemoveHeaderFooter (strip headers/footers), AcceptChanges (apply tracked changes). The API returns JSON or a text file with the extracted text. Use it when you need plain text from Word for search, analysis, or migration.
Key features
- Page range – StartPageNumber and EndPageNumber limit extraction to specific pages.
- Content filtering – RemoveComments, RemoveHeaderFooter, AcceptChanges control what is included.
- Formats – Supports .doc and .docx (Base64).
- Async – Use async for large documents.
:::tip Best for Use when you need plain text from Word (e.g. for search, migration, or analysis). For PDF extraction use Extract Resources or Extract Text by Expression. :::
API parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| api-key | string | Yes | Your PDF4me API key, Base64 encoded. Get it from the dashboard. |
| docContent | base64 | Yes | Word document content (Base64). .doc, .docx. |
| docName | string | Yes | Word file name (e.g. output). |
| StartPageNumber | integer | Yes | Starting page number. |
| EndPageNumber | integer | Yes | Ending page number. |
| RemoveComments | boolean | Yes | Remove comments from extracted text. |
| RemoveHeaderFooter | boolean | Yes | Remove header/footer from extracted text. |
| AcceptChanges | boolean | Yes | Accept tracked changes in the document. |
| async | boolean | No | Enable asynchronous processing. |
When to use Extract Text from Word
- Content migration – Extract text from Word for import into CMS, search, or databases.
- Analysis – Get plain text for NLP, search indexing, or reporting.
- Page-specific – Extract only a page range (e.g. appendix, specific sections).
:::info Need the full API? For request/response schemas and code samples, see Extract Text from Word in the PDF4me API docs. :::
Prerequisites
Before using this endpoint, make sure you have:
- A valid PDF4me API key (Get your API Key)
- A Word document (.docx or .doc) in Base64 format
Response Format
The API returns a JSON response or text file with extracted text content from the specified page range.