Document Fetching Guidance
Environment Setup
Specification
Orbit will provide a folder under orbit-data-provider AWS S3 bucket.Orbit will provide an AWS S3 Key Pair for accessing the data under the prepared S3 folder.
With Your Case
In your case:
The AWS S3 folder will be: s3://orbit-data-provider/clients/jpmorgan/
Key Pair will be provided in separate file.
Way to fetch data
Specification
As the raw report data is really large, Orbit will only provide the report index in our client folder.After Client get the index, then clients can download the raw report as needed. The data is updated in real time and clients can use any SKD (like boto3 in Python) to fetch the data which client would like to use.
The SDK will leverage the Key Pair to get permissions for accessing the data.
Get data
Below screenshot is the sample data we delivered (The data will be delivered according to specific client requirements for real delivery).
Each file represents a report;
The name of the file is by CURRENT UTC TIME + REPORT ID, in this way can be easy to filter with aws SDK.
There is a presigned_url key in each report(each line) for downloading the raw report.
{
"report_id": "f_7UrMt7SKXYWoDKIWjZpOCb",
"reported_at": "2025-04-04",
"report_title": "DEF 14A",
"report_type_id_list": [
"10178"
],
"company_info": [
{
"orbit_id": "1-4295904557",
"company_name": "MORGAN STANLEY",
"isin": [
"US61747S5047",
],
"ticker": [
"MS"
],
"country": [
"US"
]
}
],
"attachments": [
{
"s3_path": "s3://filing-reports/reports-data/stock_us/2025/04/04/edgar-data-895421-000114036125012302-ny20039620x1_def14a.htm.pdf",
"presigned_url_4_file": "https://filing-reports.s3.amazonaws.com/reports-data/stock_us/2025/04/04/edgar-data-895421-000114036125012302-ny20039620x1_def14a.htm.pdf?AWSAccessKeyId=AKIAZ2SDT5DU46K54RGA&Signature=DBaDcO1qTdYPzcxwQa5TFd8tYGg%3D&Expires=1749017986",
"presigned_url_4_pages": "https://filing-reports.s3.amazonaws.com/txt-vector/reports-data/stock_us/2025/04/04/edgar-data-895421-000114036125012302-ny20039620x1_def14a.htm.pdf/pages.txt?AWSAccessKeyId=AKIAZ2SDT5DU46K54RGA&Signature=e9JyxeMDEQyklR9LhQTnLc82gH8%3D&Expires=1749017986",
"presigned_url_4_blocks": "https://filing-reports.s3.amazonaws.com/txt-vector/reports-data/stock_us/2025/04/04/edgar-data-895421-000114036125012302-ny20039620x1_def14a.htm.pdf/blocks.txt?AWSAccessKeyId=AKIAZ2SDT5DU46K54RGA&Signature=qGcmz4k6JPXSKWUoQyUOW7FWLmY%3D&Expires=1749017986",
"presigned_url_4_pages_vector": "https://filing-reports.s3.amazonaws.com/txt-vector/reports-data/stock_us/2025/04/04/edgar-data-895421-000114036125012302-ny20039620x1_def14a.htm.pdf/pages.txt.vector?AWSAccessKeyId=AKIAZ2SDT5DU46K54RGA&Signature=%2FIauqGI%2BuEHA0ym5s1%2Bsy4pUDaA%3D&Expires=1749017986",
"presigned_url_4_blocks_vector": "https://filing-reports.s3.amazonaws.com/txt-vector/reports-data/stock_us/2025/04/04/edgar-data-895421-000114036125012302-ny20039620x1_def14a.htm.pdf/blocks.txt.vector?AWSAccessKeyId=AKIAZ2SDT5DU46K54RGA&Signature=jcWGfstmaGhF0N0BU7uhrY9cdMQ%3D&Expires=1749017986"
}
],
"x_version": 1
}
Clients can read the data in a programming way.
The code below is to use the boto3 SDK in Python.
import json
import boto3
s3_client = boto3.client('s3', aws_access_key_id="your key id", aws_secret_access_key="your secret key")
bucket_name = 'orbit-data-provider'
prefix = 'clients/abc/' # Your owned data folder
response = s3_client.get_object(Bucket=bucket_name, Key="clients/marketscreener/streaming/20241108072151_f_gNEoQE9TllGsQhHjAIambp.json")
file_content = response['Body'].read().decode('utf-8')
print(json.loads(file_content)) # Decode data in json format
⚠️ Please be notified that the expiration of presigned_url is typically 7 days. But you can also regenerate them again by using s3_path key in the index file. Below is the sample code.
import json
import boto3
import os
import re
s3_client = boto3.client('s3', aws_access_key_id="your key id", aws_secret_access_key="your secret key")
def gen_n_files_presign_url(s3_path):
s3_path_obj = s3_split_path(s3_path)
presigned_url_pdf = s3_client.generate_presigned_url(
'get_object',
Params={
'Bucket': s3_path_obj["bucket"],
'Key': s3_path_obj["store_path"]
},
ExpiresIn=604800
) # 7 days
return presigned_url_pdf
# Tool method
def s3_split_path(s3_path: str):
if not s3_path.startswith('s3://'):
raise Exception("Invalid s3 path format.")
s3_path_re = re.compile(r"(s3://[a-zA-Z\-_0-9]+)/(.+)")
path_group = s3_path_re.search(s3_path).groups()
return {
'bucket': path_group[0].replace('s3://', ''),
'store_path': path_group[1],
}
Last updated