Extract Attachment from PDF - File Recovery API
PDF4me Extract Attachment From PDF enables you to extract all file attachments embedded within PDF documents. This API service processes PDF files and extracts embedded files, documents, and resources from PDF documents. The API receives PDF content through REST API calls, utilizing Base64 encoding for secure transmission. With support for extracting multiple file types from PDF documents, this solution is ideal for document management systems and file extraction workflows.
Authenticating Your API Request
To access the PDF4me REST API, every request must include proper authentication credentials. Authentication ensures secure communication and validates your identity as an authorized user of the REST API.
Key Features
- File Extraction: Extract all embedded files, documents, and resources from PDF documents
- Multiple File Types: Support for various attachment formats including text, images, documents, and archives
- Batch Extraction: Extract multiple attachments from a single PDF document in one operation
- Base64 Encoding: Secure file content transmission using Base64 encoding
- Simple API Integration: RESTful API designed for automated document extraction workflows
REST API Endpoint
The PDF4me REST API uses standard HTTP methods to interact with resources. All attachment extraction operations are performed through a single endpoint:
- Method: POST
- Endpoint:
/api/v2/ExtractAttachmentFromPdf
REST API Parameters
Complete list of parameters for the Extract Attachment From PDF REST API. Parameters are organized by category for better understanding and implementation.
Important: Parameters marked with an asterisk (*) are required and must be provided for the API to function correctly.
Required Parameters
| Parameter | Type | Description | Example |
|---|---|---|---|
| docContent* | Base64 (String) | The content of the input PDF file encoded in Base64 format for attachment extraction and file recovery processing | JVBERi... |
| docName* | String | Source PDF file name with proper .pdf extension for document identification and attachment extraction processing | output.pdf |
Optional Parameters
| Parameter | Type | Description | Example |
|---|---|---|---|
| async | Boolean | Enable asynchronous processing. When true, the API returns 202 Accepted with a Location header for polling the result | true |
Output
The PDF4me Extract Attachment From PDF REST API returns different responses based on the processing mode. The API returns extracted attachment data as a JSON response or as a ZIP file containing all extracted attachments.
- Success Response
- Asynchronous Processing
- Error Responses
- Response Format Details
Synchronous Processing (Default)
When async is false or not provided, the API returns the extracted attachment results immediately.
Status Code: 200 OK
Response Format:
{
"outputDocuments": [
{
"fileName": "attachment1.txt",
"streamFile": "base64-encoded-file-content..."
},
{
"fileName": "attachment2.pdf",
"streamFile": "base64-encoded-file-content..."
}
]
}
The response contains an array of extracted attachments, each with a fileName and streamFile (Base64-encoded content).
Asynchronous Processing
When async is true, the API processes the document asynchronously.
Initial Response:
Status Code: 202 Accepted
Response Headers:
Location: https://api.pdf4me.com/api/v2/ExtractAttachmentFromPdfStatus/{operationId}
Response Body:
{
"traceId": "operation-trace-id"
}
Polling for Results:
Use the Location header URL to poll for completion:
const response = await fetch(locationUrl, {
headers: { 'Authorization': 'Basic ' + apiKey }
});
// Continue polling until status code is 200
if (response.status === 200) {
const result = await response.json();
// Process extracted attachments
}
Error Responses
| Status Code | Description | Example Response |
|---|---|---|
| 400 Bad Request | Invalid request parameters or missing required fields | {"error": "Missing required parameter: docContent"} |
| 401 Unauthorized | Invalid or missing API key | {"error": "Unauthorized"} |
| 408 Request Timeout | Request processing timeout | {"error": "Request timeout"} |
| 500 Internal Server Error | Server error during processing | {"error": "Internal server error"} |
Understanding the JSON Response
The extraction response is a JSON object containing an array of extracted attachments:
- outputDocuments: Array of extracted attachment objects
- fileName: The name of the extracted file
- streamFile: Base64-encoded content of the extracted file
Alternative: ZIP File Response
In some cases, the API may return a ZIP file containing all extracted attachments instead of JSON. The ZIP file can be downloaded and extracted to access individual files.
Decoding Base64 Content:
To decode the Base64-encoded file content from streamFile:
// JavaScript
const decoded = atob(base64String);
const blob = new Blob([decoded], { type: 'application/octet-stream' });
# Python
import base64
decoded = base64.b64decode(base64_string)
with open('extracted_file.txt', 'wb') as f:
f.write(decoded)
Request Example
Header
Content-Type: application/json
Authorization: Basic YOUR_BASE64_ENCODED_API_KEY
Note:
- Get your API key from the PDF4me Dashboard
- The API key must be Base64 encoded and prefixed with "Basic " in the Authorization header
- Example: If your API key is
abc123, encode it to Base64 and useAuthorization: Basic YWJjMTIz
Payload
Basic Request:
{
"docContent": "JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMiAwIFIKPj4KZW5kb2JqCjIgMCBvYmoKPDwKL1R5cGUgL1BhZ2VzCi9LaWRzIFszIDAgUl0KL0NvdW50IDEKPD4KZW5kb2JqCjMgMCBvYmoKPDwKL1R5cGUgL1BhZ2UKL1BhcmVudCAyIDAgUgovTWVkaWFCb3ggWzAgMCA2MTIgNzkyXQovUmVzb3VyY2VzIDw8Ci9Gb250IDw8Ci9GMSA0IDAgUgo+Pgo+PgovQ29udGVudHMgNSAwIFIKPj4KZW5kb2JqCjQgMCBvYmoKPDwKL1R5cGUgL0ZvbnQKL1N1YnR5cGUgL1R5cGUxCi9CYXNlRm9udCAvSGVsdmV0aWNhCj4+CmVuZG9iago1IDAgb2JqCjw8Ci9MZW5ndGggNDQKPj4Kc3RyZWFtCkJUCi9GMSAxMiBUZgoxMDAgNzAwIFRkCihIZWxsbyBXb3JsZCkgVGoKRVQKZW5kc3RyZWFtCmVuZG9iagp4cmVmCjAgNgowMDAwMDAwMDAwIDY1NTM1IGYgCjAwMDAwMDAwMDkgMDAwMDAgbiAKMDAwMDAwMDA1NCAwMDAwMCBuIAowMDAwMDAwMTAxIDAwMDAwIG4gCjAwMDAwMDAxNzAgMDAwMDAgbiAKMDAwMDAwMDI0NCAwMDAwMCBuIAp0cmFpbGVyCjw8Ci9TaXplIDYKL1Jvb3QgMSAwIFIKPj4Kc3RhcnR4cmVmCjM0MQolJUVPRg==",
"docName": "output.pdf"
}
With Asynchronous Processing:
{
"docContent": "JVBERi0xLjQKJeLjz9MK...",
"docName": "output.pdf",
"async": true
}
Code Samples
The PDF4me Extract Attachment From PDF REST API provides code samples in multiple programming languages. Choose the language that best fits your development environment:
- C#
- Java
- JavaScript
- Python
- Salesforce
- n8n
- Google Script
- AWS Lambda
Google Script Sample
Google Apps Script implementation for Google Workspace integration:
Attachment Extraction Features
File Recovery Capabilities
- Complete Extraction: Extract all embedded files and resources from PDF documents
- File Type Detection: Automatic identification of embedded file types and formats
- Metadata Preservation: Maintain original file names, creation dates, and properties
- Quality Assurance: Ensure file integrity and proper extraction without corruption
Document Processing
- PDF Analysis: Deep analysis of PDF structure to locate embedded attachments
- Resource Identification: Intelligent detection of various embedded file types
- Batch Extraction: Process multiple attachments from a single PDF in one operation
- Secure Processing: Safe extraction with file integrity verification
Advanced Features
- Multiple Format Support: Extract documents, images, archives, and multimedia files
- File Organization: Maintain proper file naming and organization during extraction
- Content Validation: Verify extracted file integrity and completeness
- Professional Results: High-quality extraction suitable for enterprise applications
Industry Use Cases & Applications
- Document Management & Recovery
- Legal & Compliance
- Business & Finance
- Technical & Engineering
- Healthcare & Medical
Document Management & Recovery Use Cases
- Archive Recovery: Extract files from archived PDF documents for data recovery
- Content Management: Retrieve embedded resources from document management systems
- File Organization: Extract and organize files from complex PDF packages
- Data Migration: Extract attachments during document migration processes
Legal & Compliance Use Cases
- Evidence Recovery: Extract supporting documents and evidence from legal PDFs
- Compliance Documentation: Retrieve regulatory documents and certificates
- Audit Support: Extract audit trails and supporting materials
- Legal Discovery: Recover embedded files for legal discovery processes
Business & Finance Use Cases
- Invoice Processing: Extract supporting documents from invoice PDFs
- Contract Management: Retrieve contract attachments and amendments
- Financial Reporting: Extract supporting spreadsheets and reports
- Document Analysis: Analyze embedded content for business intelligence
Technical & Engineering Use Cases
- Technical Documentation: Extract code samples, specifications, and diagrams
- Project Files: Retrieve project resources and supporting materials
- Design Assets: Extract images, CAD files, and design resources
- Research Data: Recover datasets and research materials from PDFs
Healthcare & Medical Use Cases
- Medical Records: Extract patient files and medical documentation
- Research Data: Retrieve clinical trial data and research materials
- Compliance Files: Extract regulatory documents and certifications
- Patient Care: Recover supporting medical images and reports