APACHE PINOT - with APACHE ICEBERG

Sub-second analytics on the data lake!

They said it couldn't be done.

With Apache Pinot on StarTree Cloud, you can now reliably serve low-latency, high-concurrency analytics directly on data in Apache Iceberg.
Book a Demo

The lakehouse solved storage. It didn’t solve performance

Open table formats like Iceberg, Delta Lake and Hudi promise a modern, open data platform where data is stored once, governed once, and accessible by various engines purpose-built for different workloads.

The missing piece of the puzzle has been how to access it with reliably interactive performance?

Apache Pinot on StarTree Cloud has now solved this. Perform interactive analytics directly on your lakehouse — with predictability, at high concurrency, at scale.
How it works!

The power of Apache Pinot, brought to the lake!

By extending Apache Pinot’s index-first architecture to Iceberg, StarTree is able to precisely fetch only the data needed to answer a query — reducing scans, lowering cost, and delivering fast responses directly on the lakehouse.
  1. Prune aggressively StarTree uses Iceberg metadata and column statistics to narrow the query to the smallest relevant set of files and segments as early as possible.
  2. Narrow further with filters and indexes Bloom filters and Pinot indexes reduce the search space even more, helping StarTree find exactly which data blocks matter.
  3. Fetch precisely Instead of reading whole files or coarse column chunks, StarTree fetches only the relevant Parquet pages required for the query.
  4. Execute efficiently Intelligent prefetching, hierarchical caching, and a custom Parquet reader help keep performance fast and predictable, even on externally stored data.
PERFORMANCE with Predictability

500+ QPS on a 1 TB Iceberg table with sub-second response times

In initial benchmark tests, queries with StarTree on Apache Iceberg can run with high-concurrency, and almost as fast as with Pinot and local storage!
View the Benchmarks
A DIFFERENT APPROACH

Avoid the traps of pre-computation and materialized views

The traditional approach to fast analytics has been to copy data out of the lake – into external OLAP systems with pre-computed tables or materialized views. But the bigger the data becomes, the worse that approach becomes:
Limited flexibility
When you want fast answers for queries you didn't precompute, unexpected queries fall back to slow path or require new pipelines. This is a major constraint for exploratory analytics or evolving product features
Avoid combinatorial explosion
To allow users to filter by many dimensions and group by arbitrary fields, the number of required materializations explodes. The options are either to precompute more (costly), or limit queries.
Skip pipeline complexity
Materialized views require additional jobs, refresh logic, dependency management, backfills. This introduces operational overhead, failure modes, and longer iteration cycles
No Data Duplication
Storing aggregated data separately increases storage, compute and management costs. Views must be updated and may lag behind source data. Even with streaming, complexity increases significantly
StarTree doesn’t require you to precompute every question your users might ask. It makes raw Iceberg data fast enough to answer them interactively.
WHAT WILL YOU BUILD?

Ideal for projects that already have data in Iceberg — and need it to do more

StarTree is a strong fit for projects that need to serve analytics directly from the lakehouse without building or maintaining a separate high-performance serving stack.

Interactive Analytics

Exploratory analytics on the data lake doesn’t need to be slow. Give users the power to explore data interactively, filter by many dimensions, and ask new questions as data structures evolve.

Deep-Dive Observability

Give anomoly detection and observability systems the power and performance to run at high query volumes and with minimal latency. Slice and dice and investigate issues interactively, handle high concurrency and bursty traffic, and to do this all with predictable p95/p99 latencies.

Customer & Agent Facing Data Products

Build customer-facing apps and agent-facing data products without building new data pipelines or custom data stores. Ideal for prototyping new products or tapping into historical data that was previously limited to delayed reports.

Book a demo

Try StarTree with your lakehouse data

See what fast analytics on Iceberg looks like in your environment.

Book a demo to chat about your lakehouse demands and query patterns, and to learn more about how StarTree can help you deliver faster analytics with less complexity.


Subscribe to get notifications of the latest news, events, and releases at StarTree