Jun 17 - Webinar - High-performance full text search directly on Iceberg : RSVP Here

Apache Pinot in 2026

Haven’t looked at Pinot in a few years? A lot has changed.

It’s been over 10 years since Pinot was first open-sourced, and almost five years since it graduated into a top-level Apache project. In that time Pinot has evolved into a battle-hardened engine for solving a wide variety of low-latency analytics workloads – even at high concurrency.

Written By
Published
Reading Time

Pinot now supports real-time and batch-backed tables, joins, window functions, upserts, schema evolution, JSON indexing, and a broad indexing toolkit. StarTree builds on the open-source (OSS) foundation with managed operations and extensions aimed at harder production problems like very large upsert metadata, tiered storage, Iceberg/lakehouse access, and high-cardinality remote filtering.

In 2026, Pinot is a strong fit whenever analytics are part of the product experience: customer-facing dashboards, fraud/risk decisions, observability, personalization, usage analytics, or operational workflows where p95/p99 latency and concurrency matter. It is less about replacing a warehouse for every analytical query and more about serving fast, predictable analytics where slow queries become a product problem.

Let’s dive into some of the most notable improvements:

Can Apache Pinot do joins?

Yes. Pinot offers robust join capabilities with multiple optimization strategies.

Joins run on Pinot’s multi-stage query engine (MSQE). This reflects both support for the syntax, and for a distributed execution path which is designed for the job. The MSQE supports multiple join strategies: co-located joins, distributed hash joins, broadcast joins, lookup joins. It can do fact-to-fact and fact-to-dimension joins.

An important nuance is that join performance still depends on physical layout, not just the query syntax. The Pinot docs have great content on join strategies and how to think about optimization strategy and table layout together. Joins are available, but you should still think like a distributed systems engineer when you want them to be fast and predictable.

StarTree customers leverage Pinot’s multi-stage engine for production workloads for critical use cases including user facing metrics, fraud detection, risk analysis and more. Multi-stage queries now amount to roughly half a billion queries per week across all production environments.

How complete is Pinot SQL now?

Pinot uses the Apache Calcite SQL parser, a widely used SQL parser and query planning framework. Pinot SQL will be very familiar to engineers who use modern analytical SQL, and its surface has expanded significantly: CTEs, subqueries, joins, window functions, and many common functions are now supported.

The Pinot community is working towards PostgreSQL compatibility, but many analytical queries can already be run with minimal changes.

Does Pinot support upserts, or late-arriving data?

Upserts are where many OLAP systems struggle because they turn immutable analytical storage into mutable serving infrastructure. Apache Pinot includes support for real-time upserts, including support for full/partial upserts, deletes, different comparison columns and offline-table upserts for batch-ingested corrections and replays. 

While mutable analytics is powerful, managing it at scale can present real engineering challenges. While OSS Pinot supports this, large upsert workloads create operational pressure around metadata size, storage bloat, and compaction.

StarTree Cloud solves for upserts at scale. StarTree does this by using ‘off-heap’ upserts to reduce memory pressure by persisting the upsert metadata to local disk. With StarTree, you can handle billions of upsert keys, while guaranteeing business metric accuracy. Segment Refresh Task handles merging and compacting obsolete versions in the background while keeping queries consistent. And in addition to improving scalability, off-heap upserts often reduce infrastructure costs by up to 10x.

OSS Pinot can absolutely handle corrections, late arrivals, and mutable facts. StarTree makes those workloads cheaper and more stable at a larger scale.

Do you need to pre-aggregate or materialize data before loading it into Pinot?

Not necessarily. Pinot does not require a separate upstream pipeline to pre-materialize metrics ahead of time. Pinot’s indexing model provides flexible options for efficient query on raw data. You can apply different indexing strategies to different columns, and adjust these as you need. Pinot supports a wide variety of index types including inverted, range, JSON, text, timestamp, and more. 

Pinot’s star-tree index enables real-time aggregation of billions of rows of raw data in milliseconds without the need to build materialized views. It does this by precomputing selected aggregation paths during segment generation, allowing aggregation-heavy queries over very large datasets to return in milliseconds when the query shape matches the configured dimensions and metrics.

Pinot also supports two kinds of rollups: during ingestion in real-time tables as well as for historical data rollups in an asynchronous manner

Pinot is also able to do lightweight transformations out of the box such as arithmetic/string operations, JSON flattening, Groovy based transformations etc. This is a valuable feature that removes additional friction for ingestion pipelines.

Pinot is often strongest when you ingest raw or lightly transformed events and then use indexes to make common filters and group-bys fast. Pre-aggregation can still be useful for some workloads, but it is an optimization choice, not a universal requirement.

StarTree Cloud builds on Pinot with managed indexing guidance and product features designed to reduce the cost of running Pinot at scale. When working with StarTree you’ll have guidance on ingestion, data management, and index-focused optimization.

Is Pinot a stream-only system?

Not at all. Pinot grew out of LinkedIn alongside the Kafka project, and so it is well-known for its tight integration with Kafka. But Pinot can ingest from many real-time sources (Kinesis, Pulsar) and ingest batch data from a variety of other sources too. 

Pinot is very good at serving fresh data from streams, but it also has a strong hybrid pattern for backfills, enrichments, corrections, and longer-term historical serving.

StarTree adds a variety of connectors for ingesting data from other popular data sources including Flink, Snowflake, BigQuery, and more

Does Pinot work with tiered storage?

Pinot’s performance comes from reducing how much data has to be scanned and how quickly relevant structures can be accessed, not from assuming all data lives in heap memory. 

In OSS Pinot, storage and compute are still relatively tightly coupled. But with StarTree, storage can be decoupled from compute, enabling users to query entire tables across local and object storage while keeping only some segments local.

Can Pinot query lakehouse data?

New with StarTree in 2026, you can now query Iceberg tables without having to ingest them – and at a level of efficiency that can sometimes be as fast as Pinot native. 

StarTree achieves this by leveraging Pinot’s advanced indexing capabilities to keep track of the precise columns and blocks required to answer a query. The query engine then needs to make fewer requests and transfer less data from object storage, instead of dragging whole partitions back into the query path. 

For well-laid-out Iceberg tables and query patterns that benefit from pruning, indexing, and caching, StarTree can deliver sub-second interactive analytics directly on the data lake. Benchmarks on representative lakehouse workloads, show StarTree is capable of delivering sub-second latency at 500 QPS and materially faster and lower query cost than other lakehouse query engines

This is a big deal. It creates opportunities for organizations to power apps and agents directly from lakehouse data, without moving the data, and at much lower costs than might otherwise be possible.

Do schema changes require a rebuild?

Pinot supports dynamic schema evolution and index changes. Additive changes are the safe path, and the standard flow is to add columns, update config if needed, reload affected segments, and backfill only when the use case needs it. 

Indexes can also be changed on existing data via reload API. Minion tasks handle re-indexing in the background.

While you shouldn’t expect arbitrary schema evolution with no consequences, you can easily add columns without forcing a full reload, and without downtime. Additive evolution is normal with Pinot. 

StarTree Data Portal further lowers the operational friction around ingestion, table management, and schema changes.

This is a good example of how StarTree expands on Apache Pinot in 2026. Apache Pinot provides the core real-time analytics engine. StarTree extends it for more demanding cloud-era serving patterns: larger mutable datasets, more cost-sensitive storage tiers, and production-friendly management for organizations that need Pinot’s performance without running every piece of the system by hand.

Can Pinot work with semi-structured data?

Pinot offers native support for storing, indexing, and querying semi-structured JSON data, enabling fast, real-time analytics on nested structures. It utilizes JSON indexes to accelerate filtering on JSON string columns, avoiding expensive full-scan operations, and supports SQL functions like JSON_MATCH and JSONPATH to query nested fields efficiently

It’s still a good idea to evaluate whether JSON indexing model matches your workload. Pinot’s JSON index docs explain supported column types, filtering patterns, distinct acceleration, and limitations.

StarTree offers a composite JSON index that accelerates complex JSON queries by enabling efficient filtering and retrieval of nested fields. This also reduces the index size and improves performance.

What’s the big deal with Pinot’s indexes?

Pinot has an expansive selection of indexes including bloom filters, forward indexes, FST, geospatial, inverted, JSON, range, star-tree, text search support, and timestamp indexes. That is a broad indexing toolkit, and it is central to how Pinot achieves low latency across different workload shapes. In addition Pinot supports rich text search features built using Lucene.

For engineers coming from other OLAP systems, this is an important shift in how to think about performance. In Pinot, query latency is tightly connected to choosing the right physical design for your access patterns. The system is not “one index plus brute force.” It is a toolkit that lets you optimize for equality filters, ranges, text-like access, nested JSON filters, and aggregation-heavy paths.

StarTree adds to Pinot’s toolkit with sparse indexes aimed at high-cardinality equality filters, especially in tiered-storage environments where reducing remote reads matters. 

Pinot’s support for rich text search features is built using Lucene. On StarTree you can also take advantage of native text index to further accelerate text search operations without relying on external libraries. It provides optimized performance for the most common text search patterns while reducing storage requirements.

Is Pinot operationally complex?

Pinot is much easier to get started with than it used to be, but production performance still benefits from thoughtful table design. StarTree reduces that operational burden through managed operations, defaults, recommendations, and automation.

StarTree’s Data Portal provides schema inference and automatic index recommendations. Automated capacity planning builds on deep operational insights from thousands of production deployments. The Minion framework takes care of automated background tasks (segment merging, rebalancing, re-indexing)

OSS Pinot can require special expertise to get it running well. But StarTree makes it accessible to data teams everywhere. 

Try Pinot in 2026 

Pinot is not a magic layer that makes every arbitrary analytical query fast. Like every low-latency OLAP system, it rewards thoughtful physical design: partitioning, sorting, indexing, segment sizing, retention, and query-shape awareness. The difference in 2026 is that Pinot’s toolkit is broader, and StarTree packages more of the operational expertise needed to use that toolkit in production.We invite you to talk with one of our Pinot experts. We’ll be happy to answer any other questions you might have, and share our experience of how to make Pinot work for you.

Contents
Share
RTInsights Research + Stratola Report

The Competitive Edge of Real-Time Data

Understand the transformative power of real-time data across industries, and uncover the technologies making real-time insights possible.
Download a copy
Subscribe to get notifications of the latest news, events, and releases at StarTree