PDF Parsing and Obtaining Result
Extract structured content from PDF documents with financial document optimization.
Endpoints
POST v1/parser/pdf
POST v1/parser/result
Description
These two APIs provide advanced PDF parsing capabilities specifically optimized for financial documents by two simple steps. Step 1 - extract a PDF file by just providing a public PDF URL to call API v1/parser/pdf; Step 2 - retrieve the parsing result by providing the PDF's unique ID which you have got from Step 1.
Our PDF parsing solution enables precise text extraction with layout fidelity, including table detection, financial statement recognition, and document structure preservation.
Parameters
parser/pdf
pdf_url
string
Yes
Public PDF file's url
parser/result
info_id
string
Yes
Unique ID got from Step 1
Examples
Request of Step 1
curl -X 'POST' \
'https://api.orbitfin.ai/v1/parser/pdf' \
-H 'accept: application/json' \
-H 'X-API-KEY: Bearer YOUR_API-KEY' \
-H 'Content-Type: application/json' \
-d '{
"pdf_url": "https://static.cninfo.com.cn/finalpage/2025-09-02/1224634496.PDF"
}'
Response of Step 1
{
"status_code": 200,
"data": "1c670924-ca3a-3364-8777-78a04be3c36b",
"message": "success"
}
Request of Step 2
curl -X 'POST' \
'https://api.orbitfin.ai/v1/parser/result' \
-H 'accept: application/json' \
-H 'X-API-KEY: Bearer YOUR_API-KEY' \
-H 'Content-Type: application/json' \
-d '{
"info_id": "1c670924-ca3a-3364-8777-78a04be3c36b"
}'
Response of Step 2
{
"status_code": 200,
"data": "https://orbit-tmp.s3.amazonaws.com/api/1c670924-ca3a-3364-8777-78a04be3c36b.zip?AWSAccessKeyId=AKIAZ2SDT5DUYIU3G434&Signature=sxwVdUiIOqfQC5ulSCtltdw1d2U%3D&Expires=1761139467",
"message": "Success"
}
Error Code
400
Download of parameter attachment address failed
400
Insufficient user balance
400
Attachment size exceeds 500MB
400
PDF attachment exceeds 700 pages
401
Unauthorized access. Invalid API key
500
An error occurred while processing your request
503
PDF file is corrupted and failed to parse
504
PDF parsing timeout
Notes:
All clickable links are valid for 7 days
The failure to process an individual report will not impact the processing of other reports
The parsing process typically takes 1-10 minutes to generate the ZIP download link, determined by file size and the number of jobs in queue.
The download link can only be successfully retrieved once using the unique ID generated in Step 1 ID, and will expire after 7 days
Example of ZIP file:
We provide the original PDF file, API metadata and two different formats of the parsing results - page level & block level within the ZIP file. If the original PDF contains images, we can also extract the images and store them in the images folder.
Structure:

api_metadata describes API version, file parsing start time and end time.
{
"version": "1.0.0",
"start_time": "2025-10-15T13:55:29.605198+00:00",
"end_time": "2025-10-15T13:56:10.297724+00:00"
}
Page Level Document Sample
{"id": "p_uepCvuXe", "page": 1, "sentence": "<text blocks>"}
{"id": "p_apMbM49F", "page": 2, "sentence": "<text blocks>"}
{"id": "p_7MHhOdNN", "page": 3, "sentence": "<text blocks>"}
{"id": "p_cSSQYrRe", "page": 4, "sentence": "<text blocks>"}
Page Level Document Data Dictionary
id
String
unique id of the text block in database
page
String
page number in sequence
sentence
String
text blocks
Block Level Document Sample
{"id": "l_XgLeH4iS", "page": 1, "seq_no": 1, "sentence": "<text blocks>", "type": "sentence", "text_location": {"location": [[32.5296, 811.296, 164.8296, 801.432]]}}
{"id": "l_D2o3ptsX", "page": 1, "seq_no": 2, "sentence": "<text blocks>", "type": "sentence", "text_location": {"location": [[31.067999999999998, 784.9872, 101.60640000000001, 775.4832]]}}
{"id": "l_XcKKlIqv", "page": 1, "seq_no": 3, "sentence": "<text blocks>", "type": "sentence", "text_location": {"location": [[158.9832, 750.996, 434.556, 705.6863999999999]]}}
{"id": "l_20yatCw6", "page": 1, "seq_no": 4, "sentence": "<text blocks>", "type": "sentence", "text_location": {"location": [[261.2448, 688.8744, 332.65439999999995, 675.72]]}}
Block Level Document Data Dictionary
id
String
unique id of the text block in database
page
String
page number in sequence
seq_no
String
block sequence in a page
sentence
String
text blocks
type
String
text, table, image
text_location
String
coordinate of the block in the page
Last updated