Extract Metadata from Word - Document Analysis API

PDF4me Extract Metadata from Word enables you to extract comprehensive metadata and properties from Word documents with detailed document analysis capabilities. This API service processes Word files and retrieves built-in document properties, custom properties, document statistics, author information, creation dates, and revision tracking. The API receives Word document content through REST API calls, utilizing Base64 encoding for secure transmission. With support for culture-specific formatting and localization, this solution is ideal for document management, compliance tracking, content analysis, and enterprise document workflows.

Authenticating Your API Request

To access the PDF4me REST API, every request must include proper authentication credentials. Authentication ensures secure communication and validates your identity as an authorized user of the REST API.

Get your api Key

Key Features

Comprehensive Metadata Extraction: Retrieve all built-in and custom document properties
Document Statistics: Get real-time page count, word count, character count, and paragraph count
Author Information: Extract author, manager, company, and contact details
Date/Time Properties: Retrieve creation, modification, and printing timestamps
Custom Properties: Access all custom document properties with automatic prefixing
Culture Support: Format dates and times according to specified locale
Read-Only Operation: Extract metadata without modifying the original document
Structured Output: Return organized metadata in JSON format for easy processing

REST API Endpoint

The PDF4me REST API uses standard HTTP methods to interact with resources. All metadata extraction operations are performed through a single endpoint:

Method: POST
Endpoint: office/ApiV2Word/ExtractMetadata

REST API Parameters

Complete list of parameters for the Extract Metadata from Word REST API. Parameters are organized by category for better understanding and implementation.

Important: Parameters marked with an asterisk (*) are required and must be provided for the API to function correctly.

Required Parameters

Parameter	Type	Description	Example
document*	Object	Document reference. Must contain Name (string) — Word file name with .docx extension	`{ "Name": "document.docx" }`
docContent*	Base64	Word document content encoded in Base64	`UEsDBBQABgAIAAAA...`

Optional Parameters

Parameter	Type	Description	Example
cultureName	String	Culture code for date/time formatting (e.g., "en-US", "de-DE", "fr-FR"). Default: InvariantCulture (consistent formatting). Affects date/time display format in metadata. Invalid cultures fall back to InvariantCulture	`en-US`

Output

The PDF4me Extract Metadata from Word REST API returns different responses based on the processing mode. The API returns the metadata as a JSON object, not as binary data.

Success Response
Asynchronous Processing
Error Responses
Response Format Details

Synchronous Processing (Default)

The API processes the request and returns:

Status Code: 200 OK

Content-Type: application/json

Response Body:

{
  "document": null,
  "fileName": null,
  "success": true,
  "errorMessage": null,
  "metadata": {
    "Author": "John Doe",
    "Title": "Project Proposal",
    "Subject": "Q1 2024 Planning",
    "Keywords": "planning, budget, strategy",
    "Comments": "Draft version for review",
    "Category": "Business",
    "Company": "Acme Corporation",
    "Manager": "Jane Smith",
    "Created": "1/15/2024 10:30:00 AM",
    "LastModified": "1/20/2024 2:45:00 PM",
    "LastPrinted": "1/18/2024 9:15:00 AM",
    "RevisionNumber": 3,
    "TotalEditingTime": 120,
    "Pages": 8,
    "Words": 2150,
    "Characters": 12800,
    "Paragraphs": 45,
    "Custom_Department": "Marketing",
    "Custom_ProjectCode": "PRJ-2024-001",
    "Custom_Confidential": "Internal"
  }
}

Response Fields:

document (Base64 or null): Not used for metadata extraction; may be null
fileName (string or null): Not used for metadata extraction; may be null
success (boolean): Indicates whether the request succeeded
errorMessage (string or null): Error details when success is false
metadata (object): Comprehensive metadata dictionary containing all document properties

How to Use:

Extract the metadata field from the JSON response
Access individual properties as needed
Use metadata for document classification, compliance tracking, or analysis

Example (JavaScript):

const response = await fetch(url, options);
const data = await response.json();
const author = data.metadata.Author;
const pageCount = data.metadata.Pages;
// Access any metadata property

Asynchronous Processing

Asynchronous behavior (202 Accepted with polling) is controlled by server configuration, not by a request body parameter. When enabled, the API may return a 202 status with a polling URL in the Location header. Poll the URL with GET requests until you receive 200 OK with the same response shape (including metadata, success, errorMessage).

Error Responses

The API returns standard HTTP error codes with error details:

400 Bad Request

Invalid request parameters
Missing required fields (document with Name, docContent)
Invalid Base64 encoding in docContent
Invalid or corrupted Word document

401 Unauthorized

Invalid or missing API key
API key not properly Base64 encoded in Authorization header
Missing Authorization: Basic header

500 Internal Server Error

Server-side processing error
Word document processing failure
Metadata extraction failure

Error Response Format:

{
  "error": "Error message describing what went wrong"
}

Response Format Details

Important: The API always returns JSON with metadata object.

Response Structure:

{
  "document": "Base64 or null",
  "fileName": "string or null",
  "success": true,
  "errorMessage": "string or null",
  "metadata": {
    // Built-in properties
    "Author": "string",
    "Title": "string",
    "Subject": "string",
    "Keywords": "string",
    "Comments": "string",
    "Category": "string",
    "Company": "string",
    "Manager": "string",
    "Created": "string (formatted date/time)",
    "LastModified": "string (formatted date/time)",
    "LastPrinted": "string (formatted date/time)",
    "RevisionNumber": "integer",
    "TotalEditingTime": "integer (minutes)",
    
    // Document statistics
    "Pages": "integer",
    "Words": "integer",
    "Characters": "integer",
    "Paragraphs": "integer",
    
    // Custom properties (prefixed with Custom_)
    "Custom_PropertyName": "value"
  }
}

Content-Type Header:

Success: application/json
The metadata is returned as a structured JSON object

Accessing Metadata:

JavaScript/Node.js:

const metadata = response.metadata;
console.log(`Author: ${metadata.Author}`);
console.log(`Pages: ${metadata.Pages}`);
console.log(`Words: ${metadata.Words}`);

Python:

metadata = response['metadata']
print(f"Author: {metadata['Author']}")
print(f"Pages: {metadata['Pages']}")
print(f"Words: {metadata['Words']}")

C#:

var metadata = response.Metadata;
Console.WriteLine($"Author: {metadata["Author"]}");
Console.WriteLine($"Pages: {metadata["Pages"]}");
Console.WriteLine($"Words: {metadata["Words"]}");

Metadata Properties

Built-in Document Properties

Property Name	Type	Description	Example
Author	String	Document author name	`"John Doe"`
Title	String	Document title	`"Project Proposal"`
Subject	String	Document subject	`"Q1 2024 Planning"`
Keywords	String	Document keywords (comma-separated)	`"planning, budget, strategy"`
Comments	String	Document comments	`"Draft version for review"`
Category	String	Document category	`"Business"`
Company	String	Company name	`"Acme Corporation"`
Manager	String	Manager name	`"Jane Smith"`
Created	String	Document creation date/time (formatted)	`"1/15/2024 10:30:00 AM"`
LastModified	String	Last saved date/time (formatted)	`"1/20/2024 2:45:00 PM"`
LastPrinted	String	Last printed date/time (formatted)	`"1/18/2024 9:15:00 AM"`
RevisionNumber	Integer	Document revision number	`3`
TotalEditingTime	Integer	Total editing time in minutes	`120`

Document Statistics

Property Name	Type	Description	Example
Pages	Integer	Total number of pages	`8`
Words	Integer	Total word count	`2150`
Characters	Integer	Total character count	`12800`
Paragraphs	Integer	Total paragraph count	`45`

Custom Properties

Custom properties are prefixed with Custom_ in the response:

Custom_Department - Custom department property
Custom_ProjectCode - Custom project code property
Custom_Confidential - Custom confidentiality level
Any other custom properties defined in the Word document

Supported Culture Examples

Culture Code	Description	Date Format Example
`en-US`	English (United States)	"1/15/2024 10:30:00 AM"
`en-GB`	English (United Kingdom)	"15/01/2024 10:30:00"
`de-DE`	German (Germany)	"15.01.2024 10:30:00"
`fr-FR`	French (France)	"15/01/2024 10:30:00"
`es-ES`	Spanish (Spain)	"15/01/2024 10:30:00"
`ja-JP`	Japanese (Japan)	"2024/01/15 10:30:00"

Request Example

Content-Type: application/json
Authorization: Basic YOUR_BASE64_ENCODED_API_KEY

Note:

Get your API key from the PDF4me Dashboard

The API key must be Base64 encoded and prefixed with "Basic " in the Authorization header

Example: If your API key is abc123, encode it to Base64 and use Authorization: Basic YWJjMTIz

Payload

Basic Example (Required Fields Only):

{
  "document": { "Name": "document.docx" },
  "docContent": "UEsDBBQABgAIAAAAIQDfpNJsWgEAACAFAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAAC..."
}

Advanced Example (With Optional Fields):

{
  "document": { "Name": "document.docx" },
  "docContent": "UEsDBBQABgAIAAAAIQDfpNJsWgEAACAFAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAAC...",
  "cultureName": "en-US"
}

Code Samples

The PDF4me Extract Metadata from Word REST API provides code samples in multiple programming languages. Choose the language that best fits your development environment:

C# (CSharp) Sample

Complete C# implementation for extracting metadata from Word:

View C# Sample

Industry Use Cases & Applications

Legal & Professional Services
Business & Enterprise
Education & Research
Finance & Banking

Legal & Professional Services Use Cases

Document Classification: Classify legal documents by case type, client, or practice area
Compliance Tracking: Monitor document creation dates and revision history for regulatory compliance
Client Document Management: Track document properties for client file organization
Audit Trail Maintenance: Log document metadata for legal audit requirements

Get Help

Get Your API Key API Samples Getting Started

Authenticating Your API Request​

Key Features​

REST API Endpoint​

REST API Parameters​

Required Parameters​

Optional Parameters​

Output​

Metadata Properties​

Built-in Document Properties​

Document Statistics​

Custom Properties​

Supported Culture Examples​

Request Example​

Header​

Payload​

Code Samples​

Industry Use Cases & Applications​

Get Help​