# PDF Parsing and Obtaining Result

## Endpoints

```
POST v1/parser/pdf
```

[**Try it out in API Playground →**](https://api.orbitfin.ai/docs#/extract/extract_pdf_v1_parser_pdf_post)

```
POST v1/parser/result
```

[**Try it out in API Playground →**](https://api.orbitfin.ai/docs#/extract/extract_info_v1_parser_result_post)

## **Description**

These two APIs provide advanced PDF parsing capabilities specifically optimized for financial documents by two simple steps. Step 1 - extract a PDF file by just providing a public PDF URL to call API v1/parser/pdf; Step 2 - retrieve the parsing result by providing the PDF's unique ID which you have got from Step 1.

Our PDF parsing solution enables precise text extraction with layout fidelity, including table detection, financial statement recognition, and document structure preservation.

## Parameters

#### parser/pdf

| Parameter | Type   | Required | Description           |
| --------- | ------ | -------- | --------------------- |
| pdf\_url  | string | Yes      | Public PDF file's url |

#### parser/result

| Parameter | Type   | Required | Description               |
| --------- | ------ | -------- | ------------------------- |
| info\_id  | string | Yes      | Unique ID got from Step 1 |

## Examples

#### Request of Step 1

```
curl -X 'POST' \
  'https://api.orbitfin.ai/v1/parser/pdf' \
  -H 'accept: application/json' \
  -H 'X-API-KEY: YOUR_API-KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "pdf_url": "https://static.cninfo.com.cn/finalpage/2025-09-02/1224634496.PDF"
}'
```

#### Response of Step 1

```json
{
  "status_code": 200,
  "data": "1c670924-ca3a-3364-8777-78a04be3c36b",
  "message": "success"
}
```

#### Request of Step 2

```
curl -X 'POST' \
  'https://api.orbitfin.ai/v1/parser/result' \
  -H 'accept: application/json' \
  -H 'X-API-KEY: YOUR_API-KEY' \
  -H 'Content-Type: application/json' \
  -d '{
  "info_id": "1c670924-ca3a-3364-8777-78a04be3c36b"
}'
```

#### Response of Step 2

```
{
  "status_code": 200,
  "data": "https://orbit-tmp.s3.amazonaws.com/api/1c670924-ca3a-3364-8777-78a04be3c36b.zip?AWSAccessKeyId=AKIAZ2SDT5DUYIU3G434&Signature=sxwVdUiIOqfQC5ulSCtltdw1d2U%3D&Expires=1761139467",
  "message": "Success"
}
```

## Error Code

| Code | Description                                     |
| ---- | ----------------------------------------------- |
| 400  | Download of parameter attachment address failed |
| 400  | Insufficient user balance                       |
| 400  | Attachment size exceeds 500MB                   |
| 400  | PDF attachment exceeds 700 pages                |
| 401  | Unauthorized access. Invalid API key            |
| 500  | An error occurred while processing your request |
| 503  | PDF file is corrupted and failed to parse       |
| 504  | PDF parsing timeout                             |

#### Notes:

* All clickable links are valid for 7 days&#x20;
* The failure to process an individual report will not impact the processing of other reports
* The parsing process typically takes 1-10 minutes to generate the ZIP download link, determined by file size and the number of jobs in queue.
* The download link can only be successfully retrieved once using the unique ID generated in Step 1 ID, and will expire after 7 days

## **Example of ZIP file:**

We provide the original PDF file, API metadata and two different formats of the parsing results -  page level & block level within the ZIP file. If the original PDF contains images, we can also extract the images and store them in the images folder.

### Structure:

<figure><img src="/files/0JcWISI2bznVpaau3Ny8" alt=""><figcaption></figcaption></figure>

api\_metadata describes API version, file parsing start time and end time.

```
{
"version": "1.0.0",
"start_time": "2025-10-15T13:55:29.605198+00:00",
"end_time": "2025-10-15T13:56:10.297724+00:00"
}
```

### Page Level Document Sample

```
{"id": "p_uepCvuXe", "page": 1, "sentence": "<text blocks>"}
{"id": "p_apMbM49F", "page": 2, "sentence": "<text blocks>"}
{"id": "p_7MHhOdNN", "page": 3, "sentence": "<text blocks>"}
{"id": "p_cSSQYrRe", "page": 4, "sentence": "<text blocks>"}
```

### Page Level Document Data Dictionary

| Field Name | Type   | Description                             |
| ---------- | ------ | --------------------------------------- |
| id         | String | unique id of the text block in database |
| page       | String | page number in sequence                 |
| sentence   | String | text blocks                             |

### Block Level Document Sample

```
{"id": "l_XgLeH4iS", "page": 1, "seq_no": 1, "sentence": "<text blocks>", "type": "sentence", "text_location": {"location": [[32.5296, 811.296, 164.8296, 801.432]]}}
{"id": "l_D2o3ptsX", "page": 1, "seq_no": 2, "sentence": "<text blocks>", "type": "sentence", "text_location": {"location": [[31.067999999999998, 784.9872, 101.60640000000001, 775.4832]]}}
{"id": "l_XcKKlIqv", "page": 1, "seq_no": 3, "sentence": "<text blocks>", "type": "sentence", "text_location": {"location": [[158.9832, 750.996, 434.556, 705.6863999999999]]}}
{"id": "l_20yatCw6", "page": 1, "seq_no": 4, "sentence": "<text blocks>", "type": "sentence", "text_location": {"location": [[261.2448, 688.8744, 332.65439999999995, 675.72]]}}
```

### Block Level Document Data Dictionary

| Field Name     | Type   | Description                             |
| -------------- | ------ | --------------------------------------- |
| id             | String | unique id of the text block in database |
| page           | String | page number in sequence                 |
| seq\_no        | String | block sequence in a page                |
| sentence       | String | text blocks                             |
| type           | String | text, table, image                      |
| text\_location | String | coordinate of the block in the page     |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.orbitfin.ai/orbit-api-reference/api/pdf-parsing-and-obtaining-result.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
