What is Apache Pinot?
Apache Pinot is an open source distributed database designed for real-time, user-facing analytics. Apache Pinot is classified as an Online Analytical Processing (OLAP) database. It is capable of low-latency query execution even at extremely high throughput. Apache Pinot can ingest directly from event streaming sources like Apache Kafka and make events available for querying immediately. It can also ingest data from online transactional processing (OLTP) databases using Change Data Capture (CDC), or from batch data sources such as data warehouses or cloud object stores. Queries are made using a subset of SQL.
Apache Pinot is a top-level project of the Apache Software Foundation (ASF).
Users can find out more about Apache Pinot at its official website: pinot.apache.org.
Advantages of Apache Pinot
Highly Scalable
Apache Pinot is a distributed system that can span thousands of nodes. These nodes act as a single system responding to query requests in unison.
High Performance
Apache Pinot can ingest real-time event streams of millions of events per second, allowing data to be queried immediately with low latency results and without data caching.
High Concurrency
Apache Pinot supports user-facing analytics, running hundreds of thousands of simultaneous queries across your data without performance bottlenecks.
Fault Tolerant
Apache Pinot automatically replicates and distributes data between nodes, using Apache Helix for highly resilient cluster management.
Flexible Indexing
Apache Pinot supports multiple types of data indexing, including its unique star-tree index that allows user-tunable performance.
Fast Column Store
Apache Pinot stores data in a columnar format, perfect for highly efficient OLAP workloads. It also supports smart query routing and aggregation optimizations.
What Makes Apache Pinot so Fast?
Chapter 1: Query Lifecycle and Optimization Techniques
Chapter 2: The Power of Indexing
How Does Apache Pinot Compare to Other Real-Time Analytics Databases?
Discover More About Apache Pinot
How Can I Get Started?
Our developer website will help you get started with Apache Pinot
The official project website hosted at the Apache Software Foundation
Who uses Apache Pinot?
Apache Pinot is used across many organizations and industries. Here are just a few of the many organizations who have adopted Apache Pinot in production:
Social Media Platforms, Online Collaboration Tools, Banking and Financial Services, Video and Audio Streaming Services, Cybersecurity, Mobile Telephony, Retail Shopping and Payment Systems, Adtech and Martech, Transportation and Delivery Services
History of Apache Pinot
This was becoming an increasingly urgent problem to solve in data-intensive organizations that had already implemented real-time event streaming systems, such as Apache Kafka, which could produce millions of events per second. There needed to be complementary OLAP databases designed specifically to handle the kinds of real-time event streaming architectures Apache Kafka could enable.
LinkedIn was the birthplace of Apache Kafka (incubated 2011, graduated 2012). Its widespread internal use at LinkedIn meant there was a ready audience and technical infrastructure to require and support a real-time analytical database to integrate with an event streaming data architecture. A team headed by Kishore Gopalakrishna created Pinot in 2014. Its first use case was to power the “Who’s Viewed Your Profile” feature. It was first announced to the world in a 2014 blog, “ Real-time Analytics at Massive Scale with Pinot”. In 2015 the project was open sourced.
By 2018 Pinot had entered incubation at the Apache Software Foundation (ASF), from which it graduated in 2021, becoming an Apache top-level project: Apache Pinot. By this time other pioneering industry-leading organizations had begun to adopt it for their own use cases: Amazon-Eero, Doordash, Factual/FourSquare, LinkedIn, Stripe, Uber, Walmart, Weibo, WePay, and others.
Kishore Gopalakrishna eventually left LinkedIn, and, with a team of co-founders, created StarTree. StarTree Cloud is a Database-as-a-Service (DBaaS) powered by Apache Pinot, built to provide a fully managed platform for real-time analytics. By removing the burden of infrastructure management, companies can focus on delivering real-time insights to their end users.