PDF Parsing and Obtaining Result

Extract structured content from PDF documents with financial document optimization.

Endpoints

POST v1/parser/pdf

Try it out in API Playground →

POST v1/parser/result

Try it out in API Playground →

Description

These two APIs provide advanced PDF parsing capabilities specifically optimized for financial documents by two simple steps. Step 1 - extract a PDF file by just providing a public PDF URL to call API v1/parser/pdf; Step 2 - retrieve the parsing result by providing the PDF's unique ID which you have got from Step 1.

Our PDF parsing solution enables precise text extraction with layout fidelity, including table detection, financial statement recognition, and document structure preservation.

Parameters

parser/pdf

Parameter

Type

Required

Description

pdf_url

string

Yes

Public PDF file's url

parser/result

Parameter

Type

Required

Description

info_id

string

Yes

Unique ID got from Step 1

Examples

Request of Step 1

curl -X 'POST' \
  'https://api.orbitfin.ai/v1/parser/pdf' \
  -H 'accept: application/json' \
  -H 'X-API-KEY: YOUR_API-KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "pdf_url": "https://static.cninfo.com.cn/finalpage/2025-09-02/1224634496.PDF"
}'

Response of Step 1

{
  "status_code": 200,
  "data": "1c670924-ca3a-3364-8777-78a04be3c36b",
  "message": "success"
}

Request of Step 2

curl -X 'POST' \
  'https://api.orbitfin.ai/v1/parser/result' \
  -H 'accept: application/json' \
  -H 'X-API-KEY: YOUR_API-KEY' \
  -H 'Content-Type: application/json' \
  -d '{
  "info_id": "1c670924-ca3a-3364-8777-78a04be3c36b"
}'

Response of Step 2

{
  "status_code": 200,
  "data": "https://orbit-tmp.s3.amazonaws.com/api/1c670924-ca3a-3364-8777-78a04be3c36b.zip?AWSAccessKeyId=AKIAZ2SDT5DUYIU3G434&Signature=sxwVdUiIOqfQC5ulSCtltdw1d2U%3D&Expires=1761139467",
  "message": "Success"
}

Error Code

Code

Description

400

Download of parameter attachment address failed

400

Insufficient user balance

400

Attachment size exceeds 500MB

400

PDF attachment exceeds 700 pages

401

Unauthorized access. Invalid API key

500

An error occurred while processing your request

503

PDF file is corrupted and failed to parse

504

PDF parsing timeout

Notes:

All clickable links are valid for 7 days
The failure to process an individual report will not impact the processing of other reports
The parsing process typically takes 1-10 minutes to generate the ZIP download link, determined by file size and the number of jobs in queue.
The download link can only be successfully retrieved once using the unique ID generated in Step 1 ID, and will expire after 7 days

Example of ZIP file:

We provide the original PDF file, API metadata and two different formats of the parsing results - page level & block level within the ZIP file. If the original PDF contains images, we can also extract the images and store them in the images folder.

Structure:

api_metadata describes API version, file parsing start time and end time.

{
"version": "1.0.0",
"start_time": "2025-10-15T13:55:29.605198+00:00",
"end_time": "2025-10-15T13:56:10.297724+00:00"
}

Page Level Document Sample

{"id": "p_uepCvuXe", "page": 1, "sentence": "<text blocks>"}
{"id": "p_apMbM49F", "page": 2, "sentence": "<text blocks>"}
{"id": "p_7MHhOdNN", "page": 3, "sentence": "<text blocks>"}
{"id": "p_cSSQYrRe", "page": 4, "sentence": "<text blocks>"}

Page Level Document Data Dictionary

Field Name

Type

Description

String

unique id of the text block in database

page

String

page number in sequence

sentence

String

text blocks

Block Level Document Sample

{"id": "l_XgLeH4iS", "page": 1, "seq_no": 1, "sentence": "<text blocks>", "type": "sentence", "text_location": {"location": [[32.5296, 811.296, 164.8296, 801.432]]}}
{"id": "l_D2o3ptsX", "page": 1, "seq_no": 2, "sentence": "<text blocks>", "type": "sentence", "text_location": {"location": [[31.067999999999998, 784.9872, 101.60640000000001, 775.4832]]}}
{"id": "l_XcKKlIqv", "page": 1, "seq_no": 3, "sentence": "<text blocks>", "type": "sentence", "text_location": {"location": [[158.9832, 750.996, 434.556, 705.6863999999999]]}}
{"id": "l_20yatCw6", "page": 1, "seq_no": 4, "sentence": "<text blocks>", "type": "sentence", "text_location": {"location": [[261.2448, 688.8744, 332.65439999999995, 675.72]]}}

Block Level Document Data Dictionary

Field Name

Type

Description

String

unique id of the text block in database

page

String

page number in sequence

seq_no

String

block sequence in a page

sentence

String

text blocks

type

String

text, table, image

text_location

String

coordinate of the block in the page

PreviousFile Download NextInternet News

Last updated 3 months ago

hashtagEndpoints

hashtagDescription

hashtagParameters

hashtagparser/pdf

hashtagparser/result

hashtagExamples

hashtagRequest of Step 1

hashtagResponse of Step 1

hashtagRequest of Step 2

hashtagResponse of Step 2

hashtagError Code

hashtagNotes:

hashtagExample of ZIP file:

hashtagStructure:

hashtagPage Level Document Sample

hashtagPage Level Document Data Dictionary

hashtagBlock Level Document Sample

hashtagBlock Level Document Data Dictionary

Endpoints

Description

Parameters

parser/pdf

parser/result

Examples

Request of Step 1

Response of Step 1

Request of Step 2

Response of Step 2

Error Code

Notes:

Example of ZIP file:

Structure:

Page Level Document Sample

Page Level Document Data Dictionary

Block Level Document Sample

Block Level Document Data Dictionary