Precise Fetching in Tiered Storage
Fast Analytics on Object Storage
Unlike lazy loading—which transfers large volumes of unnecessary data—precise fetching minimizes data movement between object storage and the query engine. Delivering interactive performance without waste.
Reduce costs, increase query speeds

Better Performance, Lower Cost

Minimized Data Movement

True Storage Decoupling

StarTree minimizes inefficient data movement
Object storage isn’t inherently slow—the bottleneck is inefficient data movement. Most OLAP systems use ‘lazy loading’ to download partitions when queries only need a small portion of that data, resulting in:
- Excessive data movement: Queries retrieve more data than needed, wasting bandwidth
- Increased latency: Users wait longer for data transfers to complete
- Higher costs: Processing unnecessary data consumes more computational resources
Overview
How Precise Fetching works
Columnar Database
Selective columnar fetch
Unlike systems that must retrieve entire partitions (in Pinot partitions are called ‘segments’) with all columns, StarTree can selectively fetch only the columns or indexes needed for a specific query.
Precise
Block-level reads
Beyond column-level precision, StarTree reads only the specific blocks containing matching data. After applying filters, it fetches just the relevant blocks within columns, dramatically reducing data transfer volume.
Parallel
Pipeline execution
StarTree decouples data fetching from execution, beginning retrieval during query planning. By pipelining I/O and processing in parallel, StarTree reduces query latency by 5x or more.
Query Optimizations
Index pinning & pruning
StarTree uses metadata (min/max values, bloom filters) to skip irrelevant segments entirely. It selectively pins small, frequently-used structures locally and leverages specialized indexes to identify only the most relevant data blocks—keeping queries fast even on object storage.

“We needed to empower our publishers to easily understand and optimize their content’s revenue impact. We’ve partnered with StarTree to move from 24-48 hour delayed data to near real-time data. We can return data in less than a second on key metrics around revenue, clicks, and page views.”

keep exploring
The most powerful real-time analytics platform
Scalable real-time upserts
Powerful, flexible indexing

Multi-stage query engine
Autoscaling minions
