The full recording of the StarTree Apache Pinot meetup in SF in May 2023 featuring Stripe, DoorDash, and StarTree:
1. Intro to Real-Time Analytics with Pinot (Tim Berglund, StarTree) | San Francisco 2023 Tim (VP of Developer Relations at StarTree) introduces Apache Pinot, an OLAP database designed for real-time analytics. He explains the background of Pinot's creation, discussing the shift from traditional monolithic architectures to the event-driven world and the need for fast, real-time insights. Tim delves into the architecture of Pinot, covering data ingestion, segment storage, server distribution, and query processing. He also mentions the various indexing options available in Pinot. He concludes by highlighting the ongoing exploration in building systems for the event-driven world and invites further questions and discussions on the topic.
2. Building a Real-Time Analytics Platform (Lakshmi Rao, Stripe) | San Francisco 2023 In this presentation, Lakshmi Rao, a software engineer at Stripe, discusses building a real-time analytics platform using Apache Pinot. Stripe, a payments infrastructure provider, utilizes Pinot to empower users with fast, accurate, and fresh data about their transactions. Pinot powers user-facing interactions, internal use cases, and supports various platform offerings within Stripe. Lakshmi explains the user experience, the process of deploying Pinot clusters, data ingestion, observability, and future plans to automate user onboarding, abstract the cluster details from users, and contribute to the Pinot open-source community.
3. Supporting Multiple Pinot Use Cases at Scale (Will Gan, DoorDash) | San Francisco 2023 In this presentation, Will Gan (Software Engineer, DoorDash) focuses on two use cases at DoorDash: Mx Portal Ads Campaign Reporting and Risk Platform Dashboarding. DoorDash utilizes Pinot for real-time analytics and low query latency. For the Mx Portal use case, they use hybrid tables to track impressions, clicks, and orders. Challenges arose with data volume, but co-locating rows and separating REALTIME and OFFLINE servers improved performance. In the Risk Platform use case, dashboards were built using Superset on top of Pinot tables. Optimizing resource usage was achieved by onboarding to Tiered Storage, reducing server count. Data retention is a challenge, and DoorDash is considering separate aggregated tables for older data.