Resources
Blog

Apache Pinot 101: Lesson 5 - Queries with SQL, Joins


Fasih
Fasih Khatib
Developer Advocate at StarTree
released on
April 23, 2025
READ TIME
10 mins

This is the fifth lesson in the Apache Pinot 101 Tutorial Series:

In the previous post you learned how to create an index to speed up queries. This lesson explores how Pinot executes queries. It covers the two query engines Pinot ships with – the single-stage query engine, and the multi-stage query engine. By the end of this lesson you’ll be writing queries to find the current value of the Bitcoin portfolio.

Single-stage Query Engine

Navigate to the query console on http://localhost:9000

Consider the following query which finds the current amount of Bitcoin in the portfolio.

SELECT 
SUM(CASE side WHEN 'buy' THEN size ELSE -1 * size END) AS amount
FROM portfolio;

This simple query adds all the buys and subtracts all the sells to find the final amount of coin that’s in the portfolio. This query can be executed by the single-stage query engine.

The single-stage query engine uses a scatter-gather approach. The broker receives the query and sends it to all the relevant servers. It receives the results from each of these servers and performs a final computation to combine all the intermediate results and arrive at the final result. This is then sent back to the user.

The single-stage query engine is limited to simpler single table queries. Queries involving joins, CTEs, and window functions are not supported by the single-stage engine. To write more complex queries, you’d use the multi-stage query engine.

Multi-stage Query Engine

Consider a query that finds the value of the portfolio. This is derived by multiplying the amount of Bitcoin with the latest price. The query below shows how to compute this.

WITH coin AS (
  SELECT SUM(
      CASE
        side
        WHEN 'buy' THEN size
        ELSE -1 * size
      END
    ) AS amount
  FROM portfolio
),
price AS (
  SELECT price
  FROM ticker
  ORDER BY time_ms DESC
  LIMIT 1
)
SELECT price.price * coin.amount AS value
FROM coin, price

This query includes CTEs and joins, and can be executed using the multi-stage query engine. To use the multi-stage query engine, enable the “Use Multi-Stage Engine” checkbox on the query console.

In contrast to the single-stage engine, the multi-stage engine sends the query execution plan to the relevant servers. As the servers execute on the plan, they send intermediate results to other servers. Finally, all the partial computation is sent to one of the servers which performs any remaining computation before sending the result back to the user.

Ok, now you understand queries, let’s see how to visualize that data using Superset.

Next Lesson: Visualization with Superset →


Ready to deploy real-time analytics?

We’re here to help!