Multi-Stage Query Engine
Analyze your data at your speed
Run complex SQL queries without sacrificing blazing-fast query performance, gaining the business insights your teams need in real-time.
Deliver fast and accurate insights across large datasets

Improve query performance

Allocate resources more effectively without increasing costs

Scale to handle increased data volumes and query complexity
Run complex queries with native query-time joins
Unlock timely and accurate insights from your data in motion and power your real-time analytics application.
Joins Support
Optimize query performance and the execution of joins at scale with native support for serving joins with subseconds latency. The Multi-Stage Query Engine supports all three types of join strategies: Broadcast Joins, Shuffle Hash Distributed Joins, and Lookup Joins. With this capability, StarTree covers the entire spectrum of data joins, providing full coverage from user-facing analytics all the way up to ad hoc analytics.
Multi-Stage Execution Model
The multi-stage execution model is designed to handle complex multi-stage data processing. It includes an intermediate compute stage (consisting of a set of processing servers and a data exchange mechanism) that allows StarTree to handle more complex processing requirements by offloading computation from the brokers.
Indexing and Pruning
StarTree offers a rich set of indexing and pruning techniques to speed up query processing on individual tables. These techniques help reduce the overhead of scanning and aggregations to improve query latency and throughput.
Data Layout
Improve join performance by adapting based on the data layout. StarTree optimizes joins to account for data that is partitioned but not co-located, data that is partitioned and co-located, and random layouts (data is neither partitioned nor co-located).
“We migrated our old time series aggregation system from a different framework to Apache Pinot. With Pinot, we can ingest an entire stream of data and send multiple queries at it, and it will be equally performant. This offers a lot of flexibility — it allows us to lower the cost of operating, and opens up real-time analytics to folks that weren’t Scala experts but know how to write SQL queries.”

keep exploring
The most powerful real-time analytics platform
Simplified tiered storage
Powerful, flexible indexing

Scalable real-time upserts
Autoscaling minions
