Apache Pinot is an open source distributed database designed for real-time, user-facing analytics. Apache Pinot is classified as an Online Analytical Processing (OLAP) database. It is capable of low-latency query execution even at extremely high throughput. Apache Pinot can ingest directly from event streaming sources like Apache Kafka and make events available for querying immediately. It can also ingest data from online transactional processing (OLTP) databases using Change Data Capture (CDC), or from batch data sources such as data warehouses or cloud object stores. Queries are made using a subset of SQL.
Apache Pinot is a distributed system that can span thousands of nodes. These nodes act as a single system responding to query requests in unison.
Apache Pinot can ingest real-time event streams of millions of events per second, allowing data to be queried immediately with low latency results and without data caching.
Apache Pinot supports user-facing analytics, running hundreds of thousands of simultaneous queries across your data without performance bottlenecks.
Apache Pinot automatically replicates and distributes data between nodes, using Apache Helix for highly resilient cluster management.
Apache Pinot supports multiple types of data indexing, including its unique star-tree index that allows user-tunable performance.
Apache Pinot stores data in a columnar format, perfect for highly efficient OLAP workloads. It also supports smart query routing and aggregation optimizations.
Discover Apache Pinot’s query lifecycle and optimization techniques, plus dive into its architecture.
Learn about how Apache Pinot uses indexes to optimize different kinds of data queries.
Apache Pinot is used across many organizations and industries. Here are just a few of the many organizations who have adopted Apache Pinot in production:
Social Media Platforms, Online Collaboration Tools, Banking and Financial Services, Video and Audio Streaming Services, Cybersecurity, Mobile Telephony, Retail Shopping and Payment Systems, Adtech and Martech, Transportation and Delivery Services
Traditional OLAP databases were oriented to batch processing of data — getting periodic dumps or syncs from Online Transaction Processing (OLTP) databases that only occurred every few hours or even once a day. Yet as the field of “big data” grew, the need to shorten cycle times and deal with orders of magnitude more data meant that old batch data methodologies gave way to newer systems with ever-shortening time windows for data updates. Software designers sped up this cycle through a process known as “microbatching,” which shortened the update cycle to every few seconds or minutes. But it still wasn’t “real-time.”
This was becoming an increasingly urgent problem to solve in data-intensive organizations that had already implemented real-time event streaming systems, such as Apache Kafka, which could produce millions of events per second. There needed to be complementary OLAP databases designed specifically to handle the kinds of real-time event streaming architectures Apache Kafka could enable.
LinkedIn was the birthplace of Apache Kafka (incubated 2011, graduated 2012). Its widespread internal use at LinkedIn meant there was a ready audience and technical infrastructure to require and support a real-time analytical database to integrate with an event streaming data architecture. A team headed by Kishore Gopalakrishna created Pinot in 2014. Its first use case was to power the “Who’s Viewed Your Profile” feature. It was first announced to the world in a 2014 blog, “Real-time Analytics at Massive Scale with Pinot”. In 2015 the project was open sourced.
By 2018 Pinot had entered incubation at the Apache Software Foundation (ASF), from which it graduated in 2021, becoming an Apache top-level project: Apache Pinot. By this time other pioneering industry-leading organizations had begun to adopt it for their own use cases: Amazon-Eero, Doordash, Factual/FourSquare, LinkedIn, Stripe, Uber, Walmart, Weibo, WePay, and others.
Kishore Gopalakrishna eventually left LinkedIn, and, with a team of co-founders, created StarTree. StarTree Cloud is a Database-as-a-Service (DBaaS) powered by Apache Pinot, built to provide a fully managed platform for real-time analytics. By removing the burden of infrastructure management, companies can focus on delivering real-time insights to their end users.