> For the complete documentation index, see [llms.txt](https://docs.orbitfin.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.orbitfin.ai/part-2-how-it-works/2.4-from-documents-to-structured-data.md).

# 2.4 From documents to structured data

Between the raw sources and the agents that do the work sits the layer that makes Orbit fast and consistent: the structured data layer. This is where documents stop being only text to be read and become data to be queried — without ever ceasing to be documents you can still read.

A filing, a transcript, or a research note is, in its raw form, a wall of language. To answer a question from it, something has to read it, find the relevant facts, and interpret them in context. Doing that fresh every time a question is asked is slow, expensive, and inconsistent — two analysts asking the same question of the same document can come away with different answers, and the same analyst asking twice may not get the same result. Orbit removes that repetition by doing the reading once, in advance. As documents enter the platform, they are processed into structured records — the facts, figures, and statements they contain, organised, and anchored to the right company through the entity master. From then on, questions that the structured layer can answer are answered from it, rather than from the raw text underneath.

<figure><img src="/files/C5q2VEZFBXj5gyBPeloF" alt=""><figcaption></figcaption></figure>

This is the principle introduced in Part 1 — *structured-first* — made concrete. It has three consequences that run through everything above it.

The first is **consistency.** Because the reading is done once and stored, every user works from the same structured result. The answer to a question about a company does not depend on who asked, or when, or how they phrased it. Over time, this is what allows a company to be compared against itself across quarters and years, and against its peers, on a like-for-like basis — because the underlying records were produced the same way.

The second is **speed and scale.** Answering from structured data is far quicker and far cheaper than re-reading documents on demand. That difference is what makes it practical to work across an entire universe of companies rather than one name at a time. Monitoring thousands of holdings overnight, screening a whole market against a set of criteria, or running the same analysis across a sector are all feasible precisely because the heavy reading has already happened — the work at query time is light.

The third is **traceability.** Structured records do not float free of their origin. Each one carries its lineage back to the document it came from, so any figure or statement can be traced to its source and checked. The structured layer is faster to work with, but it never becomes a black box.

**Two layers, by design**

It is important to be clear about what the structured layer is not. It is not an attempt to reduce every document to fields, and it never will be — because that is not possible. Financial documents carry nuance, narrative, qualification, and judgment that no structured schema can fully capture. The exact wording of a risk disclosure, the tone of a management answer on an earnings call, the caveats around a guidance figure — much of what matters in research lives in language, and resists being flattened into data without losing something.

For that reason, Orbit keeps the **document layer permanently available and directly queryable**, alongside the structured layer. This is by design, not a temporary state on the way to full structuring. The raw documents are not an archive sitting behind the structured records; they are a live layer in their own right, searchable and readable, so that any question — including the ones that depend on exact language or context that no structured field would hold — can be answered from the source itself.

The two layers work together. The structured layer answers the large majority of questions quickly, consistently, and at scale. The document layer answers the rest — the questions that need the words themselves — and stands as the source of truth and audit trail beneath everything. A user never has to choose between them: Orbit draws on the structured records where they apply and the underlying documents where they are needed, and presents one answer, with its sources attached.

This pairing — structured records as the working layer, documents as the permanent, queryable ground beneath them — is what lets Orbit be both fast and complete. The agents in the chapters that follow are built on top of both. Where a structured answer exists, they use it, which is what makes their work consistent and economical at scale; where a question reaches past it, the document layer is there, by design, to answer from the source.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.orbitfin.ai/part-2-how-it-works/2.4-from-documents-to-structured-data.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.