Powering the future of Uber’s Marketplace and AVs

Uber ingests ~8 trillion Kafka events per day into a massive data ecosystem. Real-time streams feed Pinot, which serves ~200M queries per day, while a large data lake (with Hudi) supports transactional ingestion and model training.

At RTA Summit 2025, Praveen Neppalli Naga, CTO at Uber, illustrated how real-time analytics is foundational at Uber, and how they power one of the largest real-time distributed systems in production and prepare for the rise of autonomous vehicles (AVs)

Uber has grown from a simple UberX/Uber Black product into a highly heterogeneous marketplace spanning rides, delivery, freight, fleets, enterprises, and now AVs. Today, the platform operates in 70+ countries, 10,000+ cities, serves 170M+ consumers, supports 8M+ earners, and processes ~33M trips per day, with peaks of 1M concurrent trips. At that scale, concurrency and fan-out become the core engineering bottlenecks.

Naga explains the “time value of data”. At Uber, seconds-old data powers safety, ETAs, pricing, and matching; minutes-old data determines marketplace balance (over/under supply); and historical data feeds ML models for personalization and long-term optimization. Different freshness tiers require different systems, but all must interoperate seamlessly.

The value of real-time data is visible in these two platforms:

  • Marketplace indexing, which continuously indexes drivers, couriers, and trips as locations change in seconds, enabling low-latency matching decisions.
  • Mapping and ETA systems, which combine historical routing data with real-time traffic, road conditions, and driver signals to drive pricing, routing, and dispatch accuracy.

Under the hood, Uber ingests ~8 trillion Kafka events per day into a massive data ecosystem. Real-time streams feed Pinot, which serves ~200M queries per day, while a large data lake (with Hudi) supports transactional ingestion and model training.

Real-time analytics is also playing a role in AVs and the future hybrid marketplace. AVs are fixed, capital-intensive assets with limited operating corridors and strong seasonality. Uber’s advantage is combining AVs with flexible human drivers to smooth demand peaks and troughs, improving utilization and economics for AV partners like Waymo.

This hybrid future rests on three pillars: real-time marketplace orchestration, AV fleet management (including charging and utilization for EVs), and a seamless rider experience where AVs and human drivers are abstracted behind the same app.

Learn more

Dive into more detailed explanations from Uber describing their journey towards using Apache Pinot as a consolidated real-time analytics platform for metrics, cohorting, and log analytics.

The easiest way to get up and running with Apache Pinot is on StarTree Cloud. StarTree offers a managed  service, commercial integrations, and many extra  features and capabilities. Book a demo to find out more about how you can get started with Apache Pinot.

Contents
Share
Read the Report

GigaOm Sonar Report for Real-Time Analytical Databases

This report rigorously evaluates leading real-time analytical vendors (StarTree, Imply, Clickhouse, CelerData and more) to uncover the distinct technical advantages that set these specialized solutions apart.
Get a copy
Subscribe to get notifications of the latest news, events, and releases at StarTree