Skip to main content

Extract Table from PDF - Data Parser API

PDF4me Extract Table From PDF enables you to extract table structures and data from PDF documents. This API service processes PDF files and extracts table content, structure, and data from PDF documents. The API receives PDF content through REST API calls, utilizing Base64 encoding for secure transmission. With support for structured data output and table analysis, this solution is ideal for data processing workflows and business intelligence platforms.

Authenticating Your API Request

To access the PDF4me REST API, every request must include proper authentication credentials. Authentication ensures secure communication and validates your identity as an authorized user of the REST API.

Key Features

  • Table Extraction: Extract table structures and data from PDF documents
  • Structured Data Output: Retrieve table data in organized, structured formats for easy integration
  • Table Structure Preservation: Maintain table layout, headers, rows, and column relationships
  • Base64 Encoding: Secure file content transmission using Base64 encoding
  • Simple API Integration: RESTful API designed for automated table extraction workflows

REST API Endpoint

The PDF4me REST API uses standard HTTP methods to interact with resources. All table extraction operations are performed through a single endpoint:

  • Method: POST
  • Endpoint: /api/v2/ExtractTableFromPdf

REST API Parameters

Complete list of parameters for the Extract Table From PDF REST API. Parameters are organized by category for better understanding and implementation.

Important: Parameters marked with an asterisk (*) are required and must be provided for the API to function correctly.

Required Parameters

ParameterTypeDescriptionExample
docName*StringSource PDF file name with proper .pdf extension for document identification and table extraction processingoutput.pdf
docContent*Base64 (String)The content of the input PDF file encoded in Base64 format for table analysis and data extraction processingJVBERi...

Optional Parameters

ParameterTypeDescriptionExample
asyncBooleanEnable asynchronous processing. When true, the API returns 202 Accepted with a Location header for polling the resulttrue

Output

The PDF4me Extract Table From PDF REST API returns different responses based on the processing mode. The API returns extracted table data as a JSON response.

Synchronous Processing (Default)

When async is false or not provided, the API returns the extracted table data immediately.

Status Code: 200 OK

Response Format:

{
"tables": [
{
"rows": [
["Header 1", "Header 2", "Header 3"],
["Data 1", "Data 2", "Data 3"],
["Data 4", "Data 5", "Data 6"]
],
"columns": 3,
"rows": 3
}
]
}

The response contains an array of tables, each with rows, columns, and cell data.

Request Example

Content-Type: application/json
Authorization: Basic YOUR_BASE64_ENCODED_API_KEY

Note:

  • Get your API key from the PDF4me Dashboard
  • The API key must be Base64 encoded and prefixed with "Basic " in the Authorization header
  • Example: If your API key is abc123, encode it to Base64 and use Authorization: Basic YWJjMTIz

Payload

Basic Request:

{
"docContent": "JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCjIgMCBvYmoKPDwKL1R5cGUgL1BhZ2VzCi9LaWRzIFszIDAgUl0KL0NvdW50IDEKPD4KZW5kb2JqCjMgMCBvYmoKPDwKL1R5cGUgL1BhZ2UKL1BhcmVudCAyIDAgUgovTWVkaWFCb3ggWzAgMCA2MTIgNzkyXQovUmVzb3VyY2VzIDw8Ci9Gb250IDw8Ci9GMSA0IDAgUgo+Pgo+PgovQ29udGVudHMgNSAwIFIKPj4KZW5kb2JqCjQgMCBvYmoKPDwKL1R5cGUgL0ZvbnQKL1N1YnR5cGUgL1R5cGUxCi9CYXNlRm9udCAvSGVsdmV0aWNhCj4+CmVuZG9iago1IDAgb2JqCjw8Ci9MZW5ndGggNDQKPj4Kc3RyZWFtCkJUCi9GMSAxMiBUZgoxMDAgNzAwIFRkCihIZWxsbyBXb3JsZCkgVGoKRVQKZW5kc3RyZWFtCmVuZG9iagp4cmVmCjAgNgowMDAwMDAwMDAwIDY1NTM1IGYgCjAwMDAwMDAwMDkgMDAwMDAgbiAKMDAwMDAwMDA1NCAwMDAwMCBuIAowMDAwMDAwMTAxIDAwMDAwIG4gCjAwMDAwMDAxNzAgMDAwMDAgbiAKMDAwMDAwMDI0NCAwMDAwMCBuIAp0cmFpbGVyCjw8Ci9TaXplIDYKL1Jvb3QgMSAwIFIKPj4Kc3RhcnR4cmVmCjM0MQolJUVPRg==",
"docName": "output.pdf"
}

With Asynchronous Processing:

{
"docContent": "JVBERi0xLjQKJeLjz9MK...",
"docName": "output.pdf",
"async": true
}

Code Samples

The PDF4me Extract Table From PDF REST API provides code samples in multiple programming languages. Choose the language that best fits your development environment:

C# (CSharp) Sample

Complete C# implementation for extracting tables from PDF:

Table Extraction Features

Data Extraction Capabilities

  • Complete Table Recovery: Extract all table data from PDF documents with accurate structure preservation
  • Header Recognition: Identify and preserve table headers, column names, and row labels
  • Cell Content Extraction: Extract individual cell content with proper data types and formatting
  • Table Structure Analysis: Maintain table layout, relationships, and hierarchical structure

Advanced Processing

  • Multi-table Support: Process multiple tables within a single PDF document
  • Complex Table Handling: Extract tables with merged cells, spanning rows, and complex layouts
  • Format Preservation: Maintain original table formatting, styling, and visual appearance
  • Data Validation: Ensure data integrity and accuracy during extraction process

Professional Features

  • Structured Output: Generate organized, structured data for easy integration
  • Batch Processing: Process multiple PDFs and extract tables efficiently
  • Quality Assurance: Ensure extraction accuracy and reliability for business applications
  • Enterprise Integration: Seamless integration with existing data management systems

Industry Use Cases & Applications

Business Intelligence & Analytics Use Cases

  • Financial Reporting: Extract financial tables from reports for data analysis and reporting
  • Performance Metrics: Process KPI tables and performance dashboards for business intelligence
  • Data Warehousing: Extract structured data from documents for data warehouse population
  • Business Analytics: Convert PDF tables into analyzable data formats for insights

Get Help