Getting Started with Apache Pinot: Lesson 5 – Queries with SQL, Joins

This is the fifth lesson in the Apache Pinot 101 Tutorial Series:

In the previous post you learned how to create an index to speed up queries. This lesson explores how Pinot executes queries. It covers the two query engines Pinot ships with – the single-stage query engine, and the multi-stage query engine. By the end of this lesson you’ll be writing queries to find the current value of the Bitcoin portfolio.

Single-stage Query Engine

Navigate to the query console on http://localhost:9000

Consider the following query which finds the current amount of Bitcoin in the portfolio.

SELECT 
SUM(CASE side WHEN 'buy' THEN size ELSE -1 * size END) AS amount
FROM portfolio;Code language: PHP (php)

This simple query adds all the buys and subtracts all the sells to find the final amount of coin that’s in the portfolio. This query can be executed by the single-stage query engine.

The single-stage query engine uses a scatter-gather approach. The broker receives the query and sends it to all the relevant servers. It receives the results from each of these servers and performs a final computation to combine all the intermediate results and arrive at the final result. This is then sent back to the user.

The single-stage query engine is limited to simpler single table queries. Queries involving joins, CTEs, and window functions are not supported by the single-stage engine. To write more complex queries, you’d use the multi-stage query engine.

Multi-stage Query Engine

Consider a query that finds the value of the portfolio. This is derived by multiplying the amount of Bitcoin with the latest price. The query below shows how to compute this.

WITH coin AS (
  SELECT SUM(
      CASE
        side
        WHEN 'buy' THEN size
        ELSE -1 * size
      END
    ) AS amount
  FROM portfolio
),
price AS (
  SELECT price
  FROM ticker
  ORDER BY time_ms DESC
  LIMIT 1
)
SELECT price.price * coin.amount AS value
FROM coin, priceCode language: PHP (php)

This query includes CTEs and joins, and can be executed using the multi-stage query engine. To use the multi-stage query engine, enable the “Use Multi-Stage Engine” checkbox on the query console.

In contrast to the single-stage engine, the multi-stage engine sends the query execution plan to the relevant servers. As the servers execute on the plan, they send intermediate results to other servers. Finally, all the partial computation is sent to one of the servers which performs any remaining computation before sending the result back to the user.

Ok, now you understand queries, let’s see how to visualize that data using Superset.

Next Lesson: Visualization with Superset →



Contents
Share