StarTree Adds Native Iceberg Support: Serve High-Concurrency Queries Directly from Your Lakehouse

We’re excited to announce native Apache Iceberg support in StarTree Cloud—making it possible for companies to serve interactive, external-facing analytics directly from their lakehouse, with low latency and high concurrency, and without duplicating data or stitching together brittle pipelines.

Written By

Chad Meley

SVP of DevRel and Marketing

Published

July 23, 2025

Reading Time

For years, the data lakehouse has been a powerful architecture—but primarily for internal analytics. It was built to store massive volumes of historical data and support internal dashboards and data science workloads.

Exposing the lakehouse directly to external users has long been impractical. Traditional query engines struggle to deliver consistent performance at high concurrency. As a result, teams have resorted to building complex architectures—batch pipelines, pre-materialized views, and separate serving layers to meet user experience requirements. These workarounds introduce latency, cost, and operational overhead.

At the heart of many of these lakehouses is Apache Iceberg—an open table format designed for managing large-scale analytic datasets. Iceberg is gaining rapid adoption because it brings essential features like schema evolution, ACID transactions, and partition pruning to data stored in cloud object storage, without locking teams into a proprietary database format.

From Backend Store to Frontline Engine

Traditionally, building customer-facing data products on top of Iceberg required a multi-step pipeline:

Aggregate your data along the required dimensions using a query engine optimized for batch processing.
Stage and transform the output to fit the structure needed for fast access.
Populate a key-value store or serving layer to enable low-latency, high-concurrency reads at scale.

Each step added latency, complexity, and cost.

Apache Pinot has always been valued for its ability to deliver fast, interactive queries at extremely high QPS—making it a top choice for powering customer-facing analytics at companies like Activision, Slack, Hubspot, DoorDash, and others.

But as the industry shifts toward open table formats like Apache Iceberg and away from proprietary storage formats dictated by individual databases and query engines, a challenge has emerged: moving data into Pinot’s native format raised the same concerns as loading it into a separate key-value store—adding complexity, latency, and operational overhead.

Not anymore.

With StarTree’s new Iceberg integration, we’ve bridged that gap. By combining Pinot’s advanced indexing and query performance with StarTree’s managed platform, companies can now serve insights directly from Iceberg tables—without transforming or duplicating data—while maintaining the speed and scale Pinot is known for.

That means you can build customer-facing data products that are:

Fast: Sub-second query response times, even at high user volume
Efficient: No reverse ETL, no duplicated storage
Scalable: Thousands of concurrent queries, handled reliably and cost-effectively

You get the best of both worlds: an open data lakehouse architecture and a battle-tested engine for serving insights at speed and scale. With this release, the lakehouse isn’t just a warehouse anymore—it’s your data product engine.

What’s Under the Hood

StarTree’s Iceberg integration brings the following to your lakehouse stack:

Native Apache Iceberg and Parquet Integration

Previously, using Apache Pinot with Iceberg or Parquet data required extracting and converting that data into Pinot’s native file format—adding friction, latency, and operational complexity. With StarTree’s new native integration, that coupling is gone. You can now query Iceberg and Parquet tables directly, with no data transformation or duplication required. This fully decouples Pinot’s serving power from its historical storage constraints, allowing seamless compatibility with your lakehouse architecture while maintaining interactive performance and scale.

High-Performance Indexing Across Diverse Data Types

Pinot’s core strength—blazing-fast indexing—is extended to your Iceberg tables. StarTree applies columnar indexing across numeric, text, JSON, and geo fields, dramatically reducing scan cost and query latency. This makes previously sluggish datasets queryable in milliseconds, even at high throughput.

Intelligent Materialized Views via the StarTree Index

The StarTree Index dynamically pre-aggregates data along dimensions commonly queried by your application. Unlike rigid batch materialization, this system adapts to query patterns in production, making performance both scalable and efficient—even under large user loads.

Local Caching and Data Pruning

StarTree uses localized caching at the segment level and aggressive pruning to avoid touching unnecessary partitions or files. That means even large tables can deliver low-latency responses without bottlenecks at query time.

Precise Fetching and Read Optimization

Rather than lazily scanning Iceberg metadata or files, StarTree proactively and precisely prefetches relevant data blocks based on access patterns, reducing I/O waits and improving SLA consistency for external users.

Together, these capabilities mean you can serve interactive queries at scale—directly from your lakehouse—without data duplication, transformation, or additional infrastructure. This dramatically reduces architectural complexity and operational overhead while delivering the speed and concurrency your applications demand.

Build the Next Generation of Data Products—From Your Lakehouse

This capability unlocks entirely new possibilities for companies that want to turn their internal data into customer-facing products:

Financial platforms can offer merchants self-service insights into cash flow, churn risk, and revenue trends—improving transparency and driving retention.
Logistics providers can build user-facing portals that show delivery performance, route efficiency, and service benchmarks—enhancing the customer experience and reducing support costs.
Gaming companies can create in-game dashboards that surface historical player stats, achievements, and progress—keeping players engaged and encouraging deeper play.

By making data products fast, scalable, and easy to build, businesses can now deliver the kind of experiences customers expect—without the infrastructure tradeoffs that once made this impossible.

Available in Private Preview

StarTree’s native Apache Iceberg support is available in private preview in StarTree Cloud. If you’re ready to simplify your stack and bring your lakehouse data to the customers and agents who need it, request to speak with an expert to get started.

Contents

New Report

Top 6 Real-Time Analytics Use Cases

Explore 6 Real-World Use Cases including fraud detection, personalization, business metrics monitoring, and more.

Download your copy

Edit Promo