Classify Document
PDF4me Classify Document enables you to classify and identify documents based on their content. This API service processes PDF files and analyzes document structure, content patterns, and metadata to determine document types and categories. The API receives PDF content through REST API calls, utilizing Base64 encoding for secure transmission. With support for document type identification and automated categorization, this solution is ideal for document management systems and automated workflows.
Authenticating Your API Request
To access the PDF4me REST API, every request must include proper authentication credentials. Authentication ensures secure communication and validates your identity as an authorized user of the REST API.
Key Features
- Document Classification: Identify and classify document types based on content analysis
- Content Analysis: Analyze document structure, content patterns, and metadata
- Multiple Document Types: Support for various PDF document formats
- Automated Processing: Streamlined document classification without manual intervention
- Simple API Integration: RESTful API designed for automated document processing workflows
REST API Endpoint
The PDF4me REST API uses standard HTTP methods to interact with resources. All document classification operations are performed through a single endpoint:
- Method: POST
- Endpoint:
/api/v2/ClassifyDocument
REST API Parameters
Complete list of parameters for the Classify Document REST API. Parameters are organized by category for better understanding and implementation.
Important: Parameters marked with an asterisk (*) are required and must be provided for the API to function correctly.
Required Parameters
| Parameter | Type | Description | Example |
|---|---|---|---|
| docContent* | Base64 (String) | The content of the input PDF file encoded in Base64 format for document analysis and classification processing | JVBERi... |
| docName* | String | Source PDF file name with proper .pdf extension for document identification and processing | output.pdf |
Optional Parameters
| Parameter | Type | Description | Example |
|---|---|---|---|
| async | Boolean | Enable asynchronous processing. When true, the API returns 202 Accepted with a Location header for polling the result | true |
Output
The PDF4me Classify Document REST API returns different responses based on the processing mode. The API returns classification data as a JSON response.
- Success Response
- Asynchronous Processing
- Error Responses
- Response Format Details
Synchronous Processing (Default)
When async is false or not provided, the API returns the classification results immediately.
Status Code: 200 OK
Response Format:
{
"documentType": "invoice",
"category": "financial",
"confidence": 0.95,
"metadata": {
"pageCount": 1,
"createdDate": "2024-01-15T10:30:00Z"
}
}
The response contains classification data including document type, category, confidence score, and metadata information.
Asynchronous Processing
When async is true, the API processes the document asynchronously.
Initial Response:
Status Code: 202 Accepted
Response Headers:
Location: https://api.pdf4me.com/api/v2/ParseDocumentStatus/{operationId}
Response Body:
{
"traceId": "operation-trace-id"
}
Polling for Results:
Use the Location header URL to poll for completion:
const response = await fetch(locationUrl, {
headers: { 'Authorization': 'Basic ' + apiKey }
});
// Continue polling until status code is 200
if (response.status === 200) {
const result = await response.json();
// Process classification results
}
Error Responses
| Status Code | Description | Example Response |
|---|---|---|
| 400 Bad Request | Invalid request parameters or missing required fields | {"error": "Missing required parameter: docContent"} |
| 401 Unauthorized | Invalid or missing API key | {"error": "Unauthorized"} |
| 408 Request Timeout | Request processing timeout | {"error": "Request timeout"} |
| 500 Internal Server Error | Server error during processing | {"error": "Internal server error"} |
Understanding the JSON Response
The classification response is a JSON object containing structured data about the document:
- documentType: The identified type of document (e.g., "invoice", "contract", "report")
- category: The document category (e.g., "financial", "legal", "business")
- confidence: A score indicating the confidence level of the classification (0.0 to 1.0)
- metadata: Additional document information such as page count, creation date, etc.
Decoding Base64 Content (if present):
If the response includes Base64-encoded content, decode it using:
// JavaScript
const decoded = atob(base64String);
# Python
import base64
decoded = base64.b64decode(base64_string).decode('utf-8')
Request Example
Header
Content-Type: application/json
Authorization: Basic YOUR_BASE64_ENCODED_API_KEY
Note:
- Get your API key from the PDF4me Dashboard
- The API key must be Base64 encoded and prefixed with "Basic " in the Authorization header
- Example: If your API key is
abc123, encode it to Base64 and useAuthorization: Basic YWJjMTIz
Payload
Basic Request:
{
"docContent": "JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCjIgMCBvYmoKPDwKL1R5cGUgL1BhZ2VzCi9LaWRzIFszIDAgUl0KL0NvdW50IDEKPD4KZW5kb2JqCjMgMCBvYmoKPDwKL1R5cGUgL1BhZ2UKL1BhcmVudCAyIDAgUgovTWVkaWFCb3ggWzAgMCA2MTIgNzkyXQovUmVzb3VyY2VzIDw8Ci9Gb250IDw8Ci9GMSA0IDAgUgo+Pgo+PgovQ29udGVudHMgNSAwIFIKPj4KZW5kb2JqCjQgMCBvYmoKPDwKL1R5cGUgL0ZvbnQKL1N1YnR5cGUgL1R5cGUxCi9CYXNlRm9udCAvSGVsdmV0aWNhCj4+CmVuZG9iago1IDAgb2JqCjw8Ci9MZW5ndGggNDQKPj4Kc3RyZWFtCkJUCi9GMSAxMiBUZgoxMDAgNzAwIFRkCihIZWxsbyBXb3JsZCkgVGoKRVQKZW5kc3RyZWFtCmVuZG9iagp4cmVmCjAgNgowMDAwMDAwMDAwIDY1NTM1IGYgCjAwMDAwMDAwMDkgMDAwMDAgbiAKMDAwMDAwMDA1NCAwMDAwMCBuIAowMDAwMDAwMTAxIDAwMDAwIG4gCjAwMDAwMDAxNzAgMDAwMDAgbiAKMDAwMDAwMDI0NCAwMDAwMCBuIAp0cmFpbGVyCjw8Ci9TaXplIDYKL1Jvb3QgMSAwIFIKPj4Kc3RhcnR4cmVmCjM0MQolJUVPRg==",
"docName": "output.pdf"
}
With Asynchronous Processing:
{
"docContent": "JVBERi0xLjQKJeLjz9MK...",
"docName": "output.pdf",
"async": true
}
Code Samples
The PDF4me Classify Document REST API provides code samples in multiple programming languages. Choose the language that best fits your development environment:
- C#
- Java
- JavaScript
- Python
- Salesforce
- n8n
- Google Script
- AWS Lambda
Google Script Sample
Google Apps Script implementation for Google Workspace integration:
Document Classification Features
Classification Capabilities
- Document Type Detection: Identification of document types based on content analysis
- Content Categorization: Grouping of documents by content and purpose
- Template Recognition: Identification of document templates and standard formats
- Compliance Classification: Classification for regulatory and compliance requirements
Enterprise Features
- Batch Processing: Process multiple documents with consistent classification standards
- Quality Assurance: Ensure classification accuracy and reliability for business applications
- Integration Ready: Seamless integration with existing document management systems
- Scalable Processing: Handle large volumes of documents with enterprise-grade performance
Industry Use Cases & Applications
- Document Management & Organization
- Compliance & Regulatory
- Business Process Automation
- Financial Services
- Healthcare & Medical
- Legal & Professional Services
Document Management & Organization Use Cases
- Automated Filing: Automatically classify and organize documents in digital filing systems
- Content Management: Advanced categorization for content management and retrieval systems
- Archive Organization: Systematic classification of historical documents and records
- Search Optimization: Enhanced document search through accurate classification and tagging
Compliance & Regulatory Use Cases
- Audit Preparation: Automatic classification of documents for compliance and audit purposes
- Regulatory Reporting: Streamlined document processing for regulatory submissions
- Legal Discovery: Efficient document classification for legal discovery and litigation support
- Policy Compliance: Automated verification of document compliance with organizational policies
Business Process Automation Use Cases
- Workflow Routing: Automatic routing of documents based on classification results
- Process Optimization: Streamlined document processing workflows with advanced classification
- Quality Control: Automated verification of document types and content accuracy
- Resource Allocation: Efficient allocation of processing resources based on document types
Financial Services Use Cases
- Invoice Processing: Automatic classification of invoices, receipts, and financial documents
- Loan Documentation: Streamlined processing of loan applications and supporting documents
- Tax Preparation: Automated classification of tax-related documents and forms
- Audit Support: Efficient document organization for financial audits and reviews
Healthcare & Medical Use Cases
- Patient Records: Automatic classification of medical records and patient documentation
- Insurance Claims: Streamlined processing of insurance claims and medical documentation
- Compliance Documentation: Automated classification for healthcare regulatory compliance
- Research Data: Systematic organization of medical research and clinical trial documents
Legal & Professional Services Use Cases
- Contract Management: Automatic classification of contracts and legal agreements
- Case Documentation: Efficient organization of legal case files and supporting documents
- Compliance Monitoring: Automated classification for regulatory compliance and monitoring
- Client Documentation: Streamlined processing of client files and professional documentation