PDF Parsing and Obtaining Result

Extract structured content from PDF documents with financial document optimization.

Endpoints

POST v1/parser/pdf
POST v1/parser/result

Description

These two APIs provide advanced PDF parsing capabilities specifically optimized for financial documents by two simple steps. Step 1 - extract a PDF file by just providing a public PDF URL to call API v1/parser/pdf; Step 2 - retrieve the parsing result by providing the PDF's unique ID which you have got from Step 1.

Our PDF parsing solution enables precise text extraction with layout fidelity, including table detection, financial statement recognition, and document structure preservation.

Parameters

parser/pdf

Parameter
Type
Required
Description

pdf_url

string

Yes

Public PDF file's url

parser/result

Parameter
Type
Required
Description

info_id

string

Yes

Unique ID got from Step 1

Examples

Request of Step 1

curl -X 'POST' \
  'https://api.orbitfin.ai/v1/parser/pdf' \
  -H 'accept: application/json' \
  -H 'X-API-KEY: Bearer YOUR_API-KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "pdf_url": "https://static.cninfo.com.cn/finalpage/2025-09-02/1224634496.PDF"
}'

Response of Step 1

{
  "status_code": 200,
  "data": "1c670924-ca3a-3364-8777-78a04be3c36b",
  "message": "success"
}

Request of Step 2

curl -X 'POST' \
  'https://api.orbitfin.ai/v1/parser/result' \
  -H 'accept: application/json' \
  -H 'X-API-KEY: Bearer YOUR_API-KEY' \
  -H 'Content-Type: application/json' \
  -d '{
  "info_id": "1c670924-ca3a-3364-8777-78a04be3c36b"
}'

Response of Step 2

{
  "status_code": 200,
  "data": "https://orbit-tmp.s3.amazonaws.com/api/1c670924-ca3a-3364-8777-78a04be3c36b.zip?AWSAccessKeyId=AKIAZ2SDT5DUYIU3G434&Signature=sxwVdUiIOqfQC5ulSCtltdw1d2U%3D&Expires=1761139467",
  "message": "Success"
}

Error Code

Code
Description

400

Download of parameter attachment address failed

400

Insufficient user balance

400

Attachment size exceeds 500MB

400

PDF attachment exceeds 700 pages

401

Unauthorized access. Invalid API key

500

An error occurred while processing your request

503

PDF file is corrupted and failed to parse

504

PDF parsing timeout

Notes:

  • All clickable links are valid for 7 days

  • The failure to process an individual report will not impact the processing of other reports

  • The parsing process typically takes 1-10 minutes to generate the ZIP download link, determined by file size and the number of jobs in queue.

  • The download link can only be successfully retrieved once using the unique ID generated in Step 1 ID, and will expire after 7 days

Example of ZIP file:

We provide the original PDF file, API metadata and two different formats of the parsing results - page level & block level within the ZIP file. If the original PDF contains images, we can also extract the images and store them in the images folder.

Structure:

api_metadata describes API version, file parsing start time and end time.

{
"version": "1.0.0",
"start_time": "2025-10-15T13:55:29.605198+00:00",
"end_time": "2025-10-15T13:56:10.297724+00:00"
}

Page Level Document Sample

{"id": "p_uepCvuXe", "page": 1, "sentence": "<text blocks>"}
{"id": "p_apMbM49F", "page": 2, "sentence": "<text blocks>"}
{"id": "p_7MHhOdNN", "page": 3, "sentence": "<text blocks>"}
{"id": "p_cSSQYrRe", "page": 4, "sentence": "<text blocks>"}

Page Level Document Data Dictionary

Field Name
Type
Description

id

String

unique id of the text block in database

page

String

page number in sequence

sentence

String

text blocks

Block Level Document Sample

{"id": "l_XgLeH4iS", "page": 1, "seq_no": 1, "sentence": "<text blocks>", "type": "sentence", "text_location": {"location": [[32.5296, 811.296, 164.8296, 801.432]]}}
{"id": "l_D2o3ptsX", "page": 1, "seq_no": 2, "sentence": "<text blocks>", "type": "sentence", "text_location": {"location": [[31.067999999999998, 784.9872, 101.60640000000001, 775.4832]]}}
{"id": "l_XcKKlIqv", "page": 1, "seq_no": 3, "sentence": "<text blocks>", "type": "sentence", "text_location": {"location": [[158.9832, 750.996, 434.556, 705.6863999999999]]}}
{"id": "l_20yatCw6", "page": 1, "seq_no": 4, "sentence": "<text blocks>", "type": "sentence", "text_location": {"location": [[261.2448, 688.8744, 332.65439999999995, 675.72]]}}

Block Level Document Data Dictionary

Field Name
Type
Description

id

String

unique id of the text block in database

page

String

page number in sequence

seq_no

String

block sequence in a page

sentence

String

text blocks

type

String

text, table, image

text_location

String

coordinate of the block in the page

Last updated