2.4 From documents to structured data
Between the raw sources and the agents that do the work sits the layer that makes Orbit fast and consistent: the structured data layer. This is where documents stop being only text to be read and become data to be queried — without ever ceasing to be documents you can still read.
A filing, a transcript, or a research note is, in its raw form, a wall of language. To answer a question from it, something has to read it, find the relevant facts, and interpret them in context. Doing that fresh every time a question is asked is slow, expensive, and inconsistent — two analysts asking the same question of the same document can come away with different answers, and the same analyst asking twice may not get the same result. Orbit removes that repetition by doing the reading once, in advance. As documents enter the platform, they are processed into structured records — the facts, figures, and statements they contain, organised, and anchored to the right company through the entity master. From then on, questions that the structured layer can answer are answered from it, rather than from the raw text underneath.

This is the principle introduced in Part 1 — structured-first — made concrete. It has three consequences that run through everything above it.
The first is consistency. Because the reading is done once and stored, every user works from the same structured result. The answer to a question about a company does not depend on who asked, or when, or how they phrased it. Over time, this is what allows a company to be compared against itself across quarters and years, and against its peers, on a like-for-like basis — because the underlying records were produced the same way.
The second is speed and scale. Answering from structured data is far quicker and far cheaper than re-reading documents on demand. That difference is what makes it practical to work across an entire universe of companies rather than one name at a time. Monitoring thousands of holdings overnight, screening a whole market against a set of criteria, or running the same analysis across a sector are all feasible precisely because the heavy reading has already happened — the work at query time is light.
The third is traceability. Structured records do not float free of their origin. Each one carries its lineage back to the document it came from, so any figure or statement can be traced to its source and checked. The structured layer is faster to work with, but it never becomes a black box.
Two layers, by design
It is important to be clear about what the structured layer is not. It is not an attempt to reduce every document to fields, and it never will be — because that is not possible. Financial documents carry nuance, narrative, qualification, and judgment that no structured schema can fully capture. The exact wording of a risk disclosure, the tone of a management answer on an earnings call, the caveats around a guidance figure — much of what matters in research lives in language, and resists being flattened into data without losing something.
For that reason, Orbit keeps the document layer permanently available and directly queryable, alongside the structured layer. This is by design, not a temporary state on the way to full structuring. The raw documents are not an archive sitting behind the structured records; they are a live layer in their own right, searchable and readable, so that any question — including the ones that depend on exact language or context that no structured field would hold — can be answered from the source itself.
The two layers work together. The structured layer answers the large majority of questions quickly, consistently, and at scale. The document layer answers the rest — the questions that need the words themselves — and stands as the source of truth and audit trail beneath everything. A user never has to choose between them: Orbit draws on the structured records where they apply and the underlying documents where they are needed, and presents one answer, with its sources attached.
This pairing — structured records as the working layer, documents as the permanent, queryable ground beneath them — is what lets Orbit be both fast and complete. The agents in the chapters that follow are built on top of both. Where a structured answer exists, they use it, which is what makes their work consistent and economical at scale; where a question reaches past it, the document layer is there, by design, to answer from the source.
Last updated