Introducing StarTree’s Free Course on Apache Pinot

We’re excited to share that our free introductory-level course to Apache Pinot is now available! Apache Pinot 101 is your quick guide to understanding all things Pinot — server and broker functions, complex query processing, various UI options, and more.

Join Tim Berglund, StarTree’s VP of Developer Relations, as he walks you through an overview of Pinot, its architecture, indexes, ingestion models, multi-stage query engine, and upserts. With easily digestible lessons that range from 5 to 17 minutes, this course is ideal for beginning your adventure in real-time analytics with Apache Pinot.

What does the Apache Pinot 101 course cover?

Discover the world of Apache Pinot, a real-time OLAP database, through a series of fast-paced videos developed by our resident Pinot experts. Understand how Pinot excels in low-latency, high-concurrency analytical workloads on streaming data, and its unique approach to real-time analytics in terms of latency, concurrency, and freshness.

The full course includes videos on:

To get started, check out this 8-minute introduction to Apache Pinot. Learn about Pinot’s distributed architecture, optimization for analytical tasks, and how major companies like LinkedIn and Uber utilize it in their operations.

Architecture

Now that you’ve covered the basics, take a deeper dive into the key components of Apache Pinot’s architecture. We’ve created courses covering tables and segments, servers and brokers, cluster management, and minions.

Architecture: Tables and Segments

Start your journey into Pinot’s architecture with a look at its core abstraction: tables. Tim explains how tables in Apache Pinot are designed for scalability, handling massive amounts of data with ease. Learn about the simplicity of Pinot’s JSON-based table schema, the distinction between real-time, offline, and hybrid tables, and the unique columnar data storage approach that optimizes Pinot for analytical workloads.

Architecture: Servers and Brokers

In the course on servers and brokers, you’ll have the opportunity to learn about the crucial role servers play in hosting segments and handling query processing. Understand the practical distinction between real-time and offline servers, and the dynamics of replication in Pinot’s architecture. This module also explores the function of brokers in query processing, the difference between single-stage and multi-stage queries, and the importance of the Deep Store in Pinot’s ecosystem.

Architecture: Cluster Management

Interested in learning more about cluster management in Apache Pinot? Discover the pivotal role of the controller in managing the cluster, including node failure recovery, segment replication, and assignment. See how the Apache Helix controller maintains global metadata and interfaces with ZooKeeper for durable state storage. Additionally, learn about Pinot’s native support for multi-tenancy through hardware isolation and tenant tagging, ensuring data segregation for optimized performance.

Architecture: Minions

Learn how minions — worker nodes separate from the main cluster components — handle offload tasks such as segment creation, merging, and purging for offline tables. This session explores how minions provide a way to execute computationally intensive tasks without burdening the primary servers, and how they integrate with the overall Pinot framework. Dive into the extensibility of Pinot through MinionEventObserver and PinotTaskExecutor interfaces, enabling customization of minion tasks within your Pinot cluster.

Functionality

How do indexes and upserts work in Apache Pinot? How do we support SQL joins using a multi-stage query engine? Learn about key elements of Pinot’s functionality in our courses on indexes, the multi-stage query engine, and upserts.

Indexes

How do Apache Pinot’s indexing mechanisms work? Tim delves into the distinct features of various indexes like the forward index, inverted index, range index, bloom filter, text indexes, JSON index, timestamp index, geospatial index, and the star-tree index. Understanding these indexes is vital for enhancing data retrieval efficiency and making strategic choices in the configuration of your Pinot tables.

Multi-Stage Query Engine

How does Apache Pinot execute low-latency queries on both batch and streaming data with high concurrency? Discover how Pinot’s architecture, fundamentally optimized for filtering and aggregation, excels in efficiently processing typical query patterns. Additionally, this module dives deep into the multi-stage query engine, a vital feature in Pinot 1.0, which expands capabilities for handling complex queries like inner joins, thereby enhancing the range of use cases and SQL support in Pinot.

Upsert

Discover the advanced functionality of upserts and deletes in Apache Pinot, designed to handle mutable entities and rapidly changing data. Learn how Pinot efficiently manages these changes in real-time tables through an in-memory metadata map, enabling the seamless updating of existing data and the integration of deletes. This session explains how these capabilities are crucial for maintaining accurate, real-time analytics, especially when dealing with dynamic data streams like order status changes or change data capture feeds.

Ingestion

Learn all about the process of data ingestion in Apache Pinot, focusing on two primary methods: batch (offline) and streaming (real-time).

Batch Ingestion

The course on batch ingestion in Apache Pinot covers everything from data storage locations to the creation of segments and their integration into the cluster. Learn about the critical roles of schemas, table configuration, minion tasks, controllers, and servers in efficiently managing and storing data in Pinot’s offline tables.

Streaming Ingestion

In the module on streaming ingestion, you can explore the dynamic process of streaming data ingestion, including how Pinot seamlessly integrates with streaming sources like Apache Kafka and Amazon Kinesis. This detailed explanation covers the journey of streaming data from ingestion to queryable state, emphasizing the continuous consumption of data, segment creation, and the crucial roles of consuming segments, servers, and controllers in maintaining data freshness and availability.

View the complete course

Catch up on the full course by visiting our Apache Pinot 101 playlist on YouTube. We created the classes with your busy schedule in mind, so it should only take you about an hour to watch all 11 videos!

Learn more about Apache Pinot

Want to learn about Apache Pinot in person during a dedicated day of training? StarTree plans to offer an Apache Pinot training day during Real-Time Analytics Summit 2024, taking place May 8-9 in San Jose, California. The workshop will take you through the inner workings of Pinot, and help you set up and run local Pinot and Kafka clusters. Learn more about Real-Time Analytics Summit 2024.

You can also check out these resources for more information about Apache Pinot and the latest news in the community:

Join the Slack channel
Get started with Apache Pinot
Register for Real-Time Analytics Summit 2024 to meet the community in person

Interested in jumping straight into Pinot? We recommend trying our fully managed version of Apache Pinot for a hassle-free setup and experience. Start querying in minutes with our free trial of StarTree Cloud.

Introducing StarTree’s Free Course on Apache Pinot, A Real-Time OLAP Database