Skip to main content

Extract Metadata From Word Document using n8n action

PDF4me Extract Metadata From Word Document enables extracting comprehensive metadata and properties from Word documents through n8n automation workflows with detailed document analysis capabilities. This powerful metadata extraction feature supports built-in document properties, custom properties, document statistics, author information, creation dates, and revision tracking with culture-specific formatting and localization support, perfect for document management, compliance tracking, and content analysis workflows.

Setup

Add the PDF4me "Extract Metadata From Word Document" node to your n8n workflow and configure the required parameters. For initial setup instructions, see our n8n Integration Guide.

Prerequisites:

  • PDF4me API credentials
  • n8n workflow access

Configuration:

  1. Add PDF4me node to workflow
  2. Select "Extract Metadata From Word Document" action
  3. Configure input parameters (see below)
Extract Metadata From Word Document

Parameters

Complete list of parameters for the Extract Metadata From Word Document action. Configure these parameters to control metadata extraction behavior.

Important: Parameters marked with an asterisk (***) are required. Advanced parameters provide fine-grained control over metadata formatting.

ParameterTypeDescriptionExample
Input Data Type***StringWord Document Input Format Selection
• Choose the format of your Word document data input
• PDF4me supports multiple input types
• Options: Binary Data, Base64 String, or URL
Binary Data
Input Binary Field***BinaryBinary Word File Input (Required if Binary Data)
• Reference Word file (.docx, .doc) from previous n8n node or file upload
• PDF4me processes binary Word files with automatic format detection
• Required when Input Data Type is "Binary Data"
{{ $binary.data }}
Base64 Word Content***StringBase64 Encoded Word Input (Required if Base64 String)
• Provide Word content (.docx, .doc) as base64 encoded string for secure transmission
• PDF4me automatically decodes and processes the Word content
• Required when Input Data Type is "Base64 String"
UEsDBBQABgAI...
Word Document URL***StringPublic Word Document URL Input (Required if URL)
• Provide a public/open permission URL to the Word file (.docx, .doc) to be processed
• PDF4me downloads and processes the Word file from the provided URL
• Required when Input Data Type is "URL"
https://abc.com/document.docx
Word Document Name***StringWord Document Input Filename
• Specify the name of the input Word file with proper extension (.docx, .doc)
• PDF4me uses this for format detection and processing optimization
document.docx
Culture NameStringDocument Culture/Locale
• Culture code for date/time formatting (e.g., "en-US", "de-DE", "fr-FR")
• Default: InvariantCulture (consistent formatting)
• Affects date/time display format in metadata
en-US

Output

Output Parameters

ParameterTypeDescriptionExample
fileNameStringOriginal Word document filename - The name of the input Word document filemyWordFile.docx
successBooleanPDF4me operation status - Boolean flag indicating the success or failure of the metadata extraction operation. PDF4me returns true for successful operations and false for any errorstrue
cultureNameStringCulture/Locale code - Culture code used for date/time formatting (e.g., "en-US", "de-DE", "fr-FR")en-US
metadataObjectNested metadata object - Contains document metadata structure with nested metadata object containing all document properties including Author, Title, Subject, Keywords, Comments, Category, Company, Manager, Created, LastModified, LastPrinted, RevisionNumber, TotalEditingTime, Pages, Words, Characters, Paragraphs{"metadata": {"Author": "Ynoox Testtwo", "Company": "HP Inc.", "Words": 921}}
messageStringOperation message - Descriptive message indicating the result of the metadata extraction operationWord metadata extracted successfully

N8N Action Response

The PDF4me Extract Metadata From Word Document API returns a response that can be viewed in multiple formats. Choose the view that best fits your needs:

JSON Response Format

The raw JSON response from the API:

[
{
"fileName": "myWordFile.docx",
"success": true,
"cultureName": "en-US",
"metadata": {
"document": null,
"fileName": null,
"success": true,
"errorMessage": null,
"metadata": {
"Author": "Ynoox Testtwo",
"Title": "",
"Subject": "",
"Keywords": "",
"Comments": "",
"Category": "",
"Company": "HP Inc.",
"Manager": "",
"Created": "5/16/2019 6:04:00 AM",
"LastModified": "8/7/2024 7:04:00 AM",
"LastPrinted": "5/16/2019 6:04:00 AM",
"RevisionNumber": 5,
"TotalEditingTime": 2,
"Pages": 3,
"Words": 921,
"Characters": 5256,
"Paragraphs": 12
}
},
"message": "Word metadata extracted successfully"
}
]

Use Cases

Enterprise Document Automation

  • Document Management: Extract metadata for document organization and cataloging
  • Compliance Tracking: Track document properties for regulatory compliance
  • Content Analysis: Analyze document statistics and properties
  • Document Search: Use metadata for document search and retrieval

AI-Powered Document Processing

  • Multi-Format Support: Process Word documents in .docx and .doc formats
  • Multi-Language Processing: Support for different languages and regions
  • Intelligent Metadata Extraction: Automatic extraction of all document properties

Business Intelligence and Analytics

  • Document Analytics: Analyze document properties and statistics
  • Compliance Monitoring: Ensure documents meet metadata requirements
  • Performance Metrics: Track processing accuracy and efficiency

Get Help