Skip to main content

Extract Text From Word using n8n action

PDF4me Extract Text From Word extracts text content from Microsoft Word documents through n8n automation workflows. Process Word files via n8n triggers, binary data, base64 strings, or public URLs to capture document text with page range selection, optional removal of headers/footers/comments/tracked changes, and clean content extraction for analysis and processing. This solution is ideal for content analysis, text mining, document digitization, editorial cleanup, data extraction, and automated text retrieval workflows that require flexible Word content extraction with optional filtering and seamless integration.

Setup

Add the PDF4me "Extract Text From Word" node to your n8n workflow and configure the required parameters. For initial setup instructions, see our n8n Integration Guide.

Prerequisites:

  • PDF4me API credentials
  • n8n workflow access

Configuration:

  1. Add PDF4me node to workflow
  2. Select "Extract Text From Word" action
  3. Configure input parameters (see below)
Extract Text From Word Configuration

Parameters

Complete list of parameters for the Extract Text From Word action. Configure these parameters to control text extraction from Word documents.

Important: Parameters marked with an asterisk (***) are required and must be provided for the action to function correctly.

ParameterTypeDescriptionExample
Input Data Type***StringWord Input Format Selection
• Choose the format of your Word document input
• PDF4me supports multiple input types
• Options: Binary Data, Base64 String, or URL
Binary Data
Input Binary FieldBinaryBinary Word File Input (Required if Binary Data)
• Reference Word file from previous n8n node or file upload
• PDF4me processes binary Word files with automatic format detection
• Required when Input Data Type is "Binary Data"
{{ $binary.data }}
Base64 Word ContentStringBase64 Encoded Word Input (Required if Base64 String)
• Provide Word content as base64 encoded string
• PDF4me automatically decodes and processes the Word content
• Required when Input Data Type is "Base64 String"
UEsDBBQAAAAIAA...
Word URLStringPublic Word URL Input (Required if URL)
• Provide a public/open permission URL to the Word file
• PDF4me downloads and processes the file from URL
• Required when Input Data Type is "URL"
https://abc.com/xyz.docx
Document Name***StringInput Filename
• Specify the name of the input Word file
• Used for format detection and processing optimization
• Must include .docx extension
document.docx
Start Page Number***NumberPage Range Start
• Define the starting page number for text extraction
• Use 1 for the first page or specify desired starting page
• Works with End Page Number to define extraction range
1
End Page Number***NumberPage Range End
• Define the ending page number for text extraction
• Use the last page number or specify desired ending page
• Works with Start Page Number to define extraction range
3
Output Binary Field Name***StringBinary Data Mapping
• Define the variable name for accessing generated text binary data
• Used in subsequent workflow actions
• Essential for workflow data flow
data

Extraction Options

The following parameters are available in the Extraction Options section and are optional:

ParameterTypeDescriptionExample
Remove CommentsBooleanComment Filtering
• Enable or disable removal of comments and annotations
• When enabled, removes all reviewer comments and editorial notes
• Useful for clean text extraction
true
Remove Header/FooterBooleanHeader/Footer Filtering
• Enable or disable removal of headers and footers
• When enabled, excludes page headers and footers
• Ideal for content analysis workflows
true
Accept ChangesBooleanChange Tracking Management
• Enable or disable acceptance of tracked changes
• When enabled, incorporates all accepted changes
• Essential for collaborative document processing
true

Advanced Options

The following parameters are available in the Advanced Options section and are optional:

ParameterTypeDescriptionExample
Custom ProfilesStringCustom Configuration Profiles
• Set additional options using custom profiles
• JSON-like format containing predefined parameters
• Enables advanced extraction processing settings
• Optional for specialized requirements
{ "outputDataFormat": "json" }

Output

Output Parameters

ParameterTypeDescriptionExample
fileNameStringPDF4me generated filename - The complete filename of the successfully processed document with proper extension and timestamp. PDF4me ensures unique naming and validates file format compliance for seamless integration with downstream processesword_text_extraction_1756999697398.json
mimeTypeStringPDF4me MIME type identifier - The standardized MIME type for the extracted content file, typically application/json for structured data or text/plain for raw text. This ensures proper file handling and recognition across all systems and applicationsapplication/json
fileSizeNumberPDF4me file size in bytes - The exact size of the extracted content file in bytes, provided for storage planning, bandwidth optimization, and file transfer monitoring. Essential for enterprise document management and workflow automation2847
successBooleanPDF4me extraction status indicator - Boolean flag indicating the success or failure of the text extraction process. Returns true for successful extractions and false for any errors, enabling robust error handling in automated workflowstrue
messageStringPDF4me extraction status message - Descriptive message indicating the result of the text extraction process. Provides clear status messages for successful extractions and detailed error information for troubleshooting purposesText extraction from Word document completed successfully
docNameStringPDF4me original document name reference - The original filename of the input Word file that was processed. This reference is maintained for audit trails, debugging purposes, and tracking the source of extracted content in enterprise workflowsdocument.docx

N8N Action Response

The PDF4me Extract Text From Word API returns a response that can be viewed in multiple formats. Choose the view that best fits your needs:

JSON Response Format

The raw JSON response from the API:

{
"fileName": "word_text_extraction_1756999697398.json",
"mimeType": "application/json",
"fileSize": 2847,
"success": true,
"message": "Text extraction from Word document completed successfully",
"docName": "document.docx"
}

Use Cases

Content Management and Migration

  • Extract clean text from Word documents for centralized content repositories
  • Enable searchable databases and content analysis without formatting artifacts
  • Migrate content from Word documents to modern content management systems

Collaborative Document Processing

  • Process documents with tracked changes, comments, and editorial elements
  • Extract final clean text for publication and distribution
  • Handle collaborative editing workflows and version control

Document Analysis and Data Extraction

  • Extract structured text data from Word reports for business intelligence
  • Process legal documents while removing annotations and editorial markup
  • Analyze academic papers and research documents for content insights

Get Help