Apache Pinot 101: Lesson 5 - Queries with SQL, Joins

This is the fifth lesson in the Apache Pinot 101 Tutorial Series:
- Lesson 1: Up and Running with Docker
- Lesson 2: Ingesting Data with Kafka
- Lesson 3: Ingesting Batch Data
- Lesson 4: Indexes for Faster Queries
- Lesson 5: Queries with SQL, Joins
- Lesson 6: Visualization with Superset
In the previous post you learned how to create an index to speed up queries. This lesson explores how Pinot executes queries. It covers the two query engines Pinot ships with – the single-stage query engine, and the multi-stage query engine. By the end of this lesson you’ll be writing queries to find the current value of the Bitcoin portfolio.
Single-stage Query Engine
Navigate to the query console on http://localhost:9000
Consider the following query which finds the current amount of Bitcoin in the portfolio.
SELECT
SUM(CASE side WHEN 'buy' THEN size ELSE -1 * size END) AS amount
FROM portfolio;
This simple query adds all the buys and subtracts all the sells to find the final amount of coin that’s in the portfolio. This query can be executed by the single-stage query engine.
The single-stage query engine uses a scatter-gather approach. The broker receives the query and sends it to all the relevant servers. It receives the results from each of these servers and performs a final computation to combine all the intermediate results and arrive at the final result. This is then sent back to the user.
The single-stage query engine is limited to simpler single table queries. Queries involving joins, CTEs, and window functions are not supported by the single-stage engine. To write more complex queries, you’d use the multi-stage query engine.
Multi-stage Query Engine
Consider a query that finds the value of the portfolio. This is derived by multiplying the amount of Bitcoin with the latest price. The query below shows how to compute this.
WITH coin AS (
SELECT SUM(
CASE
side
WHEN 'buy' THEN size
ELSE -1 * size
END
) AS amount
FROM portfolio
),
price AS (
SELECT price
FROM ticker
ORDER BY time_ms DESC
LIMIT 1
)
SELECT price.price * coin.amount AS value
FROM coin, price
This query includes CTEs and joins, and can be executed using the multi-stage query engine. To use the multi-stage query engine, enable the “Use Multi-Stage Engine” checkbox on the query console.
In contrast to the single-stage engine, the multi-stage engine sends the query execution plan to the relevant servers. As the servers execute on the plan, they send intermediate results to other servers. Finally, all the partial computation is sent to one of the servers which performs any remaining computation before sending the result back to the user.
Ok, now you understand queries, let’s see how to visualize that data using Superset.
Next Lesson: Visualization with Superset →