Extract structured content from PDF documents with financial document optimization.
Endpoint
POST /parser/pdf
Description
This API provides advanced PDF parsing capabilities specifically optimized for financial documents. It extracts text with precise layout preservation, identifies tables, recognizes financial statements, and maintains document structure hierarchy.
Parameters
Parameter
Type
Required
Description
file
binary
Yes*
PDF file to parse (max: 50MB)
url
string
Yes*
URL of PDF to parse (alternative to file upload)
parsing_options
object
No
Parsing configuration options
*Either file or url is required, not both.
Parsing Options:
json
{"ocr_enabled":true,// Enable OCR for scanned documents"table_extraction":true,// Extract tables as structured data"preserve_layout":true,// Maintain original layout structure"page_range":[1,10],// Specific pages to parse"financial_mode":true,// Optimize for financial documents"extract_headers_footers":false,// Include headers/footers"detect_signatures":true,// Identify signature blocks"language":"en"// Document language for OCR}
Example Request (File Upload)
bash
Example Request (URL)
bash
Response
json
Block Types
Type
Description
heading
Title or section header with level (1-6)
paragraph
Standard text paragraph
table
Structured table with rows and columns
list
Bulleted or numbered list
image
Image or chart (base64 encoded)
footnote
Footnote or endnote text
header
Page header content
footer
Page footer content
page_number
Page numbering
signature
Signature block
Financial Mode Features
When financial_mode is enabled, the parser:
Identifies standard financial statements (Income Statement, Balance Sheet, Cash Flow)