APACHE PINOT - with APACHE ICEBERG

Sub-second analytics on the data lake!

They said it couldn't be done.

With Apache Pinot on StarTree Cloud, you can now reliably serve low-latency, high-concurrency analytics directly on data in Apache Iceberg.

Book a Demo

Open lakehouse storage. Interactive analytics performance.

One simpler architecture!

The lakehouse solved storage. It didn’t solve performance

Open table formats like Iceberg, Delta Lake and Hudi promise a modern, open data platform where data is stored once, governed once, and accessible by various engines purpose-built for different workloads.

The missing piece of the puzzle has been how to access it with reliably interactive performance?

Apache Pinot on StarTree Cloud has now solved this. Perform interactive analytics directly on your lakehouse — with predictability, at high concurrency, at scale.

How it works!

The power of Apache Pinot, brought to the lake!

By extending Apache Pinot’s index-first architecture to Iceberg, StarTree is able to precisely fetch only the data needed to answer a query — reducing scans, lowering cost, and delivering fast responses directly on the lakehouse.

Prune aggressively StarTree uses Iceberg metadata and column statistics to narrow the query to the smallest relevant set of files and segments as early as possible.
Narrow further with filters and indexes Bloom filters and Pinot indexes reduce the search space even more, helping StarTree find exactly which data blocks matter.
Fetch precisely Instead of reading whole files or coarse column chunks, StarTree fetches only the relevant Parquet pages required for the query.
Execute efficiently Intelligent prefetching, hierarchical caching, and a custom Parquet reader help keep performance fast and predictable, even on externally stored data.

PERFORMANCE with Predictability

500+ QPS on a 1 TB Iceberg table with sub-second response times

In initial benchmark tests, queries with StarTree on Apache Iceberg can run with high-concurrency, and almost as fast as with Pinot and local storage!

View the Benchmarks

A DIFFERENT APPROACH

More than a materialized view of the lake

The traditional approach to fast analytics has been to copy data out of the lake, and into external OLAP systems with pre-computed tables or materialized views.

But, the bigger your lake gets, the worse this approach becomes!

Lack of flexibility

When you want more than fast answers for queries you precomputed, everything else falls back to slow path or requires new pipelines. This is a major constraint for exploratory analytics or evolving product features

Combinatorial explosion

If users can filter by many dimensions and group by arbitrary fields, then the number of required materializations explodes. The options are either to precompute too much (costly), or limit queries.

Pipeline complexity

Materialized views require additional jobs, refresh logic, dependency management, backfills. This introduces operational overhead, failure modes, and longer iteration cycles

Data Duplication

Storing aggregated data separately increases storage, compute and management costs. Views must be updated and may lag behind source data. Even with streaming, complexity increases significantly

StarTree doesn’t require you to precompute every question your users might ask. It makes raw Iceberg data fast enough to answer them interactively.

WHAT WILL YOU BUILD?

Ideal for projects that already have data in Iceberg — and need it to do more

StarTree is a strong fit for projects that need to serve analytics directly from the lakehouse without building or maintaining a separate high-performance serving stack.

Interactive Analytics

Exploratory analytics on the data lake doesn’t need to be slow. Give users the power to explore data interactively, filter by many dimensions, and ask new questions as data structures evolve.

Deep-Dive Observability

Give anomoly detection and observability systems the power and performance to run at high query volumes and with minimal latency. Slice and dice and investigate issues interactively, handle high concurrency and bursty traffic, and to do this all with predictable p95/p99 latencies.

Customer & Agent Facing Data Products

Build customer-facing apps and agent-facing data products without building new data pipelines or custom data stores. Ideal for prototyping new products or tapping into historical data that was previously limited to delayed reports.

Recent Articles

Book a demo

Try StarTree with your lakehouse data

See what fast analytics on Iceberg looks like in your environment.

Book a demo to chat about your lakehouse demands and query patterns, and to learn more about how StarTree can help you deliver faster analytics with less complexity.