Skip to main content

Extract Metadata from Word - Document Analysis API

PDF4me Extract Metadata from Word enables you to extract comprehensive metadata and properties from Word documents with detailed document analysis capabilities. This API service processes Word files and retrieves built-in document properties, custom properties, document statistics, author information, creation dates, and revision tracking. The API receives Word document content through REST API calls, utilizing Base64 encoding for secure transmission. With support for culture-specific formatting and localization, this solution is ideal for document management, compliance tracking, content analysis, and enterprise document workflows.

Authenticating Your API Request

To access the PDF4me REST API, every request must include proper authentication credentials. Authentication ensures secure communication and validates your identity as an authorized user of the REST API.

Key Features

  • Comprehensive Metadata Extraction: Retrieve all built-in and custom document properties
  • Document Statistics: Get real-time page count, word count, character count, and paragraph count
  • Author Information: Extract author, manager, company, and contact details
  • Date/Time Properties: Retrieve creation, modification, and printing timestamps
  • Custom Properties: Access all custom document properties with automatic prefixing
  • Culture Support: Format dates and times according to specified locale
  • Read-Only Operation: Extract metadata without modifying the original document
  • Structured Output: Return organized metadata in JSON format for easy processing

REST API Endpoint

The PDF4me REST API uses standard HTTP methods to interact with resources. All metadata extraction operations are performed through a single endpoint:

  • Method: POST
  • Endpoint: office/ApiV2Word/ExtractMetadata

REST API Parameters

Complete list of parameters for the Extract Metadata from Word REST API. Parameters are organized by category for better understanding and implementation.

Important: Parameters marked with an asterisk (*) are required and must be provided for the API to function correctly.

Required Parameters

ParameterTypeDescriptionExample
document*ObjectDocument reference. Must contain Name (string) — Word file name with .docx extension{ "Name": "document.docx" }
docContent*Base64Word document content encoded in Base64UEsDBBQABgAIAAAA...

Optional Parameters

ParameterTypeDescriptionExample
cultureNameStringCulture code for date/time formatting (e.g., "en-US", "de-DE", "fr-FR"). Default: InvariantCulture (consistent formatting). Affects date/time display format in metadata. Invalid cultures fall back to InvariantCultureen-US

Output

The PDF4me Extract Metadata from Word REST API returns different responses based on the processing mode. The API returns the metadata as a JSON object, not as binary data.

Synchronous Processing (Default)

The API processes the request and returns:

Status Code: 200 OK

Content-Type: application/json

Response Body:

{
"document": null,
"fileName": null,
"success": true,
"errorMessage": null,
"metadata": {
"Author": "John Doe",
"Title": "Project Proposal",
"Subject": "Q1 2024 Planning",
"Keywords": "planning, budget, strategy",
"Comments": "Draft version for review",
"Category": "Business",
"Company": "Acme Corporation",
"Manager": "Jane Smith",
"Created": "1/15/2024 10:30:00 AM",
"LastModified": "1/20/2024 2:45:00 PM",
"LastPrinted": "1/18/2024 9:15:00 AM",
"RevisionNumber": 3,
"TotalEditingTime": 120,
"Pages": 8,
"Words": 2150,
"Characters": 12800,
"Paragraphs": 45,
"Custom_Department": "Marketing",
"Custom_ProjectCode": "PRJ-2024-001",
"Custom_Confidential": "Internal"
}
}

Response Fields:

  • document (Base64 or null): Not used for metadata extraction; may be null
  • fileName (string or null): Not used for metadata extraction; may be null
  • success (boolean): Indicates whether the request succeeded
  • errorMessage (string or null): Error details when success is false
  • metadata (object): Comprehensive metadata dictionary containing all document properties

How to Use:

  1. Extract the metadata field from the JSON response
  2. Access individual properties as needed
  3. Use metadata for document classification, compliance tracking, or analysis

Example (JavaScript):

const response = await fetch(url, options);
const data = await response.json();
const author = data.metadata.Author;
const pageCount = data.metadata.Pages;
// Access any metadata property

Metadata Properties

Built-in Document Properties

Property NameTypeDescriptionExample
AuthorStringDocument author name"John Doe"
TitleStringDocument title"Project Proposal"
SubjectStringDocument subject"Q1 2024 Planning"
KeywordsStringDocument keywords (comma-separated)"planning, budget, strategy"
CommentsStringDocument comments"Draft version for review"
CategoryStringDocument category"Business"
CompanyStringCompany name"Acme Corporation"
ManagerStringManager name"Jane Smith"
CreatedStringDocument creation date/time (formatted)"1/15/2024 10:30:00 AM"
LastModifiedStringLast saved date/time (formatted)"1/20/2024 2:45:00 PM"
LastPrintedStringLast printed date/time (formatted)"1/18/2024 9:15:00 AM"
RevisionNumberIntegerDocument revision number3
TotalEditingTimeIntegerTotal editing time in minutes120

Document Statistics

Property NameTypeDescriptionExample
PagesIntegerTotal number of pages8
WordsIntegerTotal word count2150
CharactersIntegerTotal character count12800
ParagraphsIntegerTotal paragraph count45

Custom Properties

Custom properties are prefixed with Custom_ in the response:

  • Custom_Department - Custom department property
  • Custom_ProjectCode - Custom project code property
  • Custom_Confidential - Custom confidentiality level
  • Any other custom properties defined in the Word document

Supported Culture Examples

Culture CodeDescriptionDate Format Example
en-USEnglish (United States)"1/15/2024 10:30:00 AM"
en-GBEnglish (United Kingdom)"15/01/2024 10:30:00"
de-DEGerman (Germany)"15.01.2024 10:30:00"
fr-FRFrench (France)"15/01/2024 10:30:00"
es-ESSpanish (Spain)"15/01/2024 10:30:00"
ja-JPJapanese (Japan)"2024/01/15 10:30:00"

Request Example

Content-Type: application/json
Authorization: Basic YOUR_BASE64_ENCODED_API_KEY

Note:

  • Get your API key from the PDF4me Dashboard
  • The API key must be Base64 encoded and prefixed with "Basic " in the Authorization header
  • Example: If your API key is abc123, encode it to Base64 and use Authorization: Basic YWJjMTIz

Payload

Basic Example (Required Fields Only):

{
"document": { "Name": "document.docx" },
"docContent": "UEsDBBQABgAIAAAAIQDfpNJsWgEAACAFAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAAC..."
}

Advanced Example (With Optional Fields):

{
"document": { "Name": "document.docx" },
"docContent": "UEsDBBQABgAIAAAAIQDfpNJsWgEAACAFAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAAC...",
"cultureName": "en-US"
}

Code Samples

The PDF4me Extract Metadata from Word REST API provides code samples in multiple programming languages. Choose the language that best fits your development environment:

C# (CSharp) Sample

Complete C# implementation for extracting metadata from Word:

Industry Use Cases & Applications

Legal & Professional Services Use Cases

  • Document Classification: Classify legal documents by case type, client, or practice area
  • Compliance Tracking: Monitor document creation dates and revision history for regulatory compliance
  • Client Document Management: Track document properties for client file organization
  • Audit Trail Maintenance: Log document metadata for legal audit requirements

Get Help