Extract Text From Word using n8n action
PDF4me Extract Text From Word extracts text content from Microsoft Word documents through n8n automation workflows. Process Word files via n8n triggers, binary data, base64 strings, or public URLs to capture document text with page range selection, optional removal of headers/footers/comments/tracked changes, and clean content extraction for analysis and processing. This solution is ideal for content analysis, text mining, document digitization, editorial cleanup, data extraction, and automated text retrieval workflows that require flexible Word content extraction with optional filtering and seamless integration.
Setup
Add the PDF4me "Extract Text From Word" node to your n8n workflow and configure the required parameters. For initial setup instructions, see our n8n Integration Guide.
Prerequisites:
- PDF4me API credentials
- n8n workflow access
Configuration:
- Add PDF4me node to workflow
- Select "Extract Text From Word" action
- Configure input parameters (see below)

Parameters
Complete list of parameters for the Extract Text From Word action. Configure these parameters to control text extraction from Word documents.
Important: Parameters marked with an asterisk (***) are required and must be provided for the action to function correctly.
| Parameter | Type | Description | Example |
|---|---|---|---|
| Input Data Type*** | String | Word Input Format Selection • Choose the format of your Word document input • PDF4me supports multiple input types • Options: Binary Data, Base64 String, or URL | Binary Data |
| Input Binary Field | Binary | Binary Word File Input (Required if Binary Data) • Reference Word file from previous n8n node or file upload • PDF4me processes binary Word files with automatic format detection • Required when Input Data Type is "Binary Data" | {{ $binary.data }} |
| Base64 Word Content | String | Base64 Encoded Word Input (Required if Base64 String) • Provide Word content as base64 encoded string • PDF4me automatically decodes and processes the Word content • Required when Input Data Type is "Base64 String" | UEsDBBQAAAAIAA... |
| Word URL | String | Public Word URL Input (Required if URL) • Provide a public/open permission URL to the Word file • PDF4me downloads and processes the file from URL • Required when Input Data Type is "URL" | https://abc.com/xyz.docx |
| Document Name*** | String | Input Filename • Specify the name of the input Word file • Used for format detection and processing optimization • Must include .docx extension | document.docx |
| Start Page Number*** | Number | Page Range Start • Define the starting page number for text extraction • Use 1 for the first page or specify desired starting page • Works with End Page Number to define extraction range | 1 |
| End Page Number*** | Number | Page Range End • Define the ending page number for text extraction • Use the last page number or specify desired ending page • Works with Start Page Number to define extraction range | 3 |
| Output Binary Field Name*** | String | Binary Data Mapping • Define the variable name for accessing generated text binary data • Used in subsequent workflow actions • Essential for workflow data flow | data |
Extraction Options
The following parameters are available in the Extraction Options section and are optional:
| Parameter | Type | Description | Example |
|---|---|---|---|
| Remove Comments | Boolean | Comment Filtering • Enable or disable removal of comments and annotations • When enabled, removes all reviewer comments and editorial notes • Useful for clean text extraction | true |
| Remove Header/Footer | Boolean | Header/Footer Filtering • Enable or disable removal of headers and footers • When enabled, excludes page headers and footers • Ideal for content analysis workflows | true |
| Accept Changes | Boolean | Change Tracking Management • Enable or disable acceptance of tracked changes • When enabled, incorporates all accepted changes • Essential for collaborative document processing | true |
Advanced Options
The following parameters are available in the Advanced Options section and are optional:
| Parameter | Type | Description | Example |
|---|---|---|---|
| Custom Profiles | String | Custom Configuration Profiles • Set additional options using custom profiles • JSON-like format containing predefined parameters • Enables advanced extraction processing settings • Optional for specialized requirements | { "outputDataFormat": "json" } |
Output
Output Parameters
| Parameter | Type | Description | Example |
|---|---|---|---|
| fileName | String | PDF4me generated filename - The complete filename of the successfully processed document with proper extension and timestamp. PDF4me ensures unique naming and validates file format compliance for seamless integration with downstream processes | word_text_extraction_1756999697398.json |
| mimeType | String | PDF4me MIME type identifier - The standardized MIME type for the extracted content file, typically application/json for structured data or text/plain for raw text. This ensures proper file handling and recognition across all systems and applications | application/json |
| fileSize | Number | PDF4me file size in bytes - The exact size of the extracted content file in bytes, provided for storage planning, bandwidth optimization, and file transfer monitoring. Essential for enterprise document management and workflow automation | 2847 |
| success | Boolean | PDF4me extraction status indicator - Boolean flag indicating the success or failure of the text extraction process. Returns true for successful extractions and false for any errors, enabling robust error handling in automated workflows | true |
| message | String | PDF4me extraction status message - Descriptive message indicating the result of the text extraction process. Provides clear status messages for successful extractions and detailed error information for troubleshooting purposes | Text extraction from Word document completed successfully |
| docName | String | PDF4me original document name reference - The original filename of the input Word file that was processed. This reference is maintained for audit trails, debugging purposes, and tracking the source of extracted content in enterprise workflows | document.docx |
N8N Action Response
The PDF4me Extract Text From Word API returns a response that can be viewed in multiple formats. Choose the view that best fits your needs:
- JSON
- Table
- Schema
- Binary
JSON Response Format
The raw JSON response from the API:
{
"fileName": "word_text_extraction_1756999697398.json",
"mimeType": "application/json",
"fileSize": 2847,
"success": true,
"message": "Text extraction from Word document completed successfully",
"docName": "document.docx"
}
Table View
Response data in a structured table format:
| Parameter | Value |
|---|---|
| fileName | word_text_extraction_1756999697398.json |
| mimeType | application/json |
| fileSize | 2847 |
| success | true |
| docName | document.docx |
| message | Text extraction from Word document completed successfully |
Schema View
The data structure and types of the response:
fileName: AB word_text_extraction_1756999697398.json
mimeType: AB application/json
fileSize: # 2847
success: ✓ true
docName: AB document.docx
message: AB Text extraction from Word document completed successfully
Type Indicators:
AB= String#= Number✓= Boolean
Binary Data View
The actual extracted text content data and metadata:
data
─────────────────────────────────────────
File Name: word_text_extraction_1756999697398.json
File Extension: json
Mime Type: application/json
File Size: 2.8 KB
Use Cases
Content Management and Migration
- Extract clean text from Word documents for centralized content repositories
- Enable searchable databases and content analysis without formatting artifacts
- Migrate content from Word documents to modern content management systems
Collaborative Document Processing
- Process documents with tracked changes, comments, and editorial elements
- Extract final clean text for publication and distribution
- Handle collaborative editing workflows and version control
Document Analysis and Data Extraction
- Extract structured text data from Word reports for business intelligence
- Process legal documents while removing annotations and editorial markup
- Analyze academic papers and research documents for content insights