Extract Table from PDF - Data Parser API
PDF4me Extract Table From PDF enables you to extract table structures and data from PDF documents. This API service processes PDF files and extracts table content, structure, and data from PDF documents. The API receives PDF content through REST API calls, utilizing Base64 encoding for secure transmission. With support for structured data output and table analysis, this solution is ideal for data processing workflows and business intelligence platforms.
Authenticating Your API Request
To access the PDF4me REST API, every request must include proper authentication credentials. Authentication ensures secure communication and validates your identity as an authorized user of the REST API.
Key Features
- Table Extraction: Extract table structures and data from PDF documents
- Structured Data Output: Retrieve table data in organized, structured formats for easy integration
- Table Structure Preservation: Maintain table layout, headers, rows, and column relationships
- Base64 Encoding: Secure file content transmission using Base64 encoding
- Simple API Integration: RESTful API designed for automated table extraction workflows
REST API Endpoint
The PDF4me REST API uses standard HTTP methods to interact with resources. All table extraction operations are performed through a single endpoint:
- Method: POST
- Endpoint:
/api/v2/ExtractTableFromPdf
REST API Parameters
Complete list of parameters for the Extract Table From PDF REST API. Parameters are organized by category for better understanding and implementation.
Important: Parameters marked with an asterisk (*) are required and must be provided for the API to function correctly.
Required Parameters
| Parameter | Type | Description | Example |
|---|---|---|---|
| docName* | String | Source PDF file name with proper .pdf extension for document identification and table extraction processing | output.pdf |
| docContent* | Base64 (String) | The content of the input PDF file encoded in Base64 format for table analysis and data extraction processing | JVBERi... |
Optional Parameters
| Parameter | Type | Description | Example |
|---|---|---|---|
| async | Boolean | Enable asynchronous processing. When true, the API returns 202 Accepted with a Location header for polling the result | true |
Output
The PDF4me Extract Table From PDF REST API returns different responses based on the processing mode. The API returns extracted table data as a JSON response.
- Success Response
- Asynchronous Processing
- Error Responses
- Response Format Details
Synchronous Processing (Default)
When async is false or not provided, the API returns the extracted table data immediately.
Status Code: 200 OK
Response Format:
{
"tables": [
{
"rows": [
["Header 1", "Header 2", "Header 3"],
["Data 1", "Data 2", "Data 3"],
["Data 4", "Data 5", "Data 6"]
],
"columns": 3,
"rows": 3
}
]
}
The response contains an array of tables, each with rows, columns, and cell data.
Asynchronous Processing
When async is true, the API processes the document asynchronously.
Initial Response:
Status Code: 202 Accepted
Response Headers:
Location: https://api.pdf4me.com/api/v2/ExtractTableFromPdfStatus/{operationId}
Response Body:
{
"traceId": "operation-trace-id"
}
Polling for Results:
Use the Location header URL to poll for completion:
const response = await fetch(locationUrl, {
headers: { 'Authorization': 'Basic ' + apiKey }
});
// Continue polling until status code is 200
if (response.status === 200) {
const result = await response.json();
// Process extracted table data
}
Error Responses
| Status Code | Description | Example Response |
|---|---|---|
| 400 Bad Request | Invalid request parameters or missing required fields | {"error": "Missing required parameter: docContent"} |
| 401 Unauthorized | Invalid or missing API key | {"error": "Unauthorized"} |
| 408 Request Timeout | Request processing timeout | {"error": "Request timeout"} |
| 500 Internal Server Error | Server error during processing | {"error": "Internal server error"} |
Understanding the JSON Response
The table extraction response is a JSON object containing:
- tables: Array of table objects
- rows: Array of arrays representing table rows (each row is an array of cell values)
- columns: Number of columns in the table
- rows: Number of rows in the table
Table Structure:
Each table is represented as a two-dimensional array where:
- The outer array represents rows
- The inner arrays represent cells in each row
- The first row typically contains column headers
Request Example
Header
Content-Type: application/json
Authorization: Basic YOUR_BASE64_ENCODED_API_KEY
Note:
- Get your API key from the PDF4me Dashboard
- The API key must be Base64 encoded and prefixed with "Basic " in the Authorization header
- Example: If your API key is
abc123, encode it to Base64 and useAuthorization: Basic YWJjMTIz
Payload
Basic Request:
{
"docContent": "JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCjIgMCBvYmoKPDwKL1R5cGUgL1BhZ2VzCi9LaWRzIFszIDAgUl0KL0NvdW50IDEKPD4KZW5kb2JqCjMgMCBvYmoKPDwKL1R5cGUgL1BhZ2UKL1BhcmVudCAyIDAgUgovTWVkaWFCb3ggWzAgMCA2MTIgNzkyXQovUmVzb3VyY2VzIDw8Ci9Gb250IDw8Ci9GMSA0IDAgUgo+Pgo+PgovQ29udGVudHMgNSAwIFIKPj4KZW5kb2JqCjQgMCBvYmoKPDwKL1R5cGUgL0ZvbnQKL1N1YnR5cGUgL1R5cGUxCi9CYXNlRm9udCAvSGVsdmV0aWNhCj4+CmVuZG9iago1IDAgb2JqCjw8Ci9MZW5ndGggNDQKPj4Kc3RyZWFtCkJUCi9GMSAxMiBUZgoxMDAgNzAwIFRkCihIZWxsbyBXb3JsZCkgVGoKRVQKZW5kc3RyZWFtCmVuZG9iagp4cmVmCjAgNgowMDAwMDAwMDAwIDY1NTM1IGYgCjAwMDAwMDAwMDkgMDAwMDAgbiAKMDAwMDAwMDA1NCAwMDAwMCBuIAowMDAwMDAwMTAxIDAwMDAwIG4gCjAwMDAwMDAxNzAgMDAwMDAgbiAKMDAwMDAwMDI0NCAwMDAwMCBuIAp0cmFpbGVyCjw8Ci9TaXplIDYKL1Jvb3QgMSAwIFIKPj4Kc3RhcnR4cmVmCjM0MQolJUVPRg==",
"docName": "output.pdf"
}
With Asynchronous Processing:
{
"docContent": "JVBERi0xLjQKJeLjz9MK...",
"docName": "output.pdf",
"async": true
}
Code Samples
The PDF4me Extract Table From PDF REST API provides code samples in multiple programming languages. Choose the language that best fits your development environment:
- C#
- Java
- JavaScript
- Python
- Salesforce
- n8n
- Google Script
- AWS Lambda
Google Script Sample
Google Apps Script implementation for Google Workspace integration:
Table Extraction Features
Data Extraction Capabilities
- Complete Table Recovery: Extract all table data from PDF documents with accurate structure preservation
- Header Recognition: Identify and preserve table headers, column names, and row labels
- Cell Content Extraction: Extract individual cell content with proper data types and formatting
- Table Structure Analysis: Maintain table layout, relationships, and hierarchical structure
Advanced Processing
- Multi-table Support: Process multiple tables within a single PDF document
- Complex Table Handling: Extract tables with merged cells, spanning rows, and complex layouts
- Format Preservation: Maintain original table formatting, styling, and visual appearance
- Data Validation: Ensure data integrity and accuracy during extraction process
Professional Features
- Structured Output: Generate organized, structured data for easy integration
- Batch Processing: Process multiple PDFs and extract tables efficiently
- Quality Assurance: Ensure extraction accuracy and reliability for business applications
- Enterprise Integration: Seamless integration with existing data management systems
Industry Use Cases & Applications
- Business Intelligence & Analytics
- Document Management & Processing
- Data Migration & Integration
- Research & Analysis
- Compliance & Regulatory
- Healthcare & Medical
Business Intelligence & Analytics Use Cases
- Financial Reporting: Extract financial tables from reports for data analysis and reporting
- Performance Metrics: Process KPI tables and performance dashboards for business intelligence
- Data Warehousing: Extract structured data from documents for data warehouse population
- Business Analytics: Convert PDF tables into analyzable data formats for insights
Document Management & Processing Use Cases
- Invoice Processing: Extract line items and pricing tables from invoices for automated processing
- Contract Analysis: Extract terms and conditions tables from legal documents
- Report Digitization: Convert paper-based reports into digital, searchable data
- Content Management: Extract structured data from documents for content management systems
Data Migration & Integration Use Cases
- System Migration: Extract table data during system migrations and data transfers
- Legacy Data Conversion: Convert historical documents into modern data formats
- API Integration: Extract data for integration with third-party systems and APIs
- Database Population: Populate databases with structured data from PDF documents
Research & Analysis Use Cases
- Academic Research: Extract data tables from research papers and academic publications
- Market Research: Process survey results and market analysis tables
- Scientific Data: Extract experimental data and research findings from scientific documents
- Statistical Analysis: Convert PDF tables into statistical analysis tools
Compliance & Regulatory Use Cases
- Regulatory Reporting: Extract compliance data from regulatory documents and reports
- Audit Support: Process financial tables and audit trails for compliance verification
- Legal Discovery: Extract structured data from legal documents for discovery processes
- Policy Analysis: Extract policy tables and regulatory requirements for analysis
Healthcare & Medical Use Cases
- Medical Records: Extract patient data tables from medical reports and records
- Clinical Trials: Process research data and clinical trial results
- Insurance Claims: Extract billing tables and claim information
- Medical Research: Convert medical research data into analyzable formats