Uber’s Storage, Search, and Data (SSD) team’s mission is to provide fast access to fresh data at scale, maintaining a consolidated platform known as EVA, built primarily over Apache Pinot. This platform is categorized as a Tier 0 service, meaning it is among the most critical systems at Uber, supporting functions that require sub-100 millisecond query latency and second-level data freshness.

Why Real-Time Analytics Matters at Uber
Real-time data is vital for three core aspects of Uber’s business:
- Actionable Insights: Tools like the “Restaurant Manager” allow owners to view real-time performance and address issues with orders immediately.
- Time-Sensitive Decisions: Fulfillment workflows for matching riders with drivers rely heavily on real-time signals to ensure efficient business operations.
- User Engagement: Real-time feedback, such as performance scorecards for freight carriers, improves the overall experience for users on the platform.

The Philosophy: Doing More with Less
The driving force behind Uber’s OLAP evolution is a philosophy of consolidation. Over the years, Uber transitioned from managing multiple siloed technologies to a lean, centralized team focusing on the RTA domain rather than specific tools. By maintaining as few technologies as possible per domain, Uber reduces engineering costs, minimizes customer confusion regarding which tool to use, and allows the team to develop deep expertise in a primary technology—Apache Pinot.

Major Milestones in Consolidation
1. Transitioning from AresDB
In 2019, Uber open-sourced AresDB, a GPU-based system designed for low-latency ingestion. However, it was a single-box system, creating an upper limit on data storage and requiring immense effort to turn into a distributed system. Furthermore, Uber lacked extensive C++ expertise to maintain it. By migrating these workloads to Pinot, the team added critical features to the Apache Pinot ecosystem, including upsert functions, dim-table joins, and geospatial features.
2. Migrating the uMetrics Platform from Elasticsearch
Uber’s metrics platform, used for monitoring marketplaces and critical decision-making, originally relied on Elasticsearch. As data volume grew, the team faced challenges with reliability and high engineering costs.

The migration to Pinot took approximately two years and involved building a testing framework to shadow production queries, ensuring result parity and performance stability. The move resulted in significantly fewer incidents and alerts while simplifying the platform by deprecating customized systems built around Elasticsearch.
3. Solving the Join Challenge: MemSQL to Pinot
Uber’s user cohorting platform (used for marketing segmentation) was previously built on MemSQL. This setup faced reliability challenges and high licensing costs. A major hurdle for migration was that Pinot was originally designed for single-table analytics.
To overcome this, Uber collaborated with the community to develop a multi-stage query engine and a feature called collocated joins. This allows large tables to be joined locally on the same server by partitioning them with the same primary key, avoiding expensive data shuffling over the wire. This migration led to a 75% reduction in query latency and a 90% improvement in page load times.
4. The Future of Logging: Moving from ClickHouse
Uber is currently migrating its logging platform, known as ‘Sawmill’, from ClickHouse to Pinot. By leveraging Pinot’s native Kafka support, Uber simplified its architecture by removing intermediate ingestors. To handle the massive scale of logs, Uber developed a new Log Compression Algorithm (CLP – Compressed Log Processor ), which identifies common templates in log structures. In testing, this has achieved an incredible 169x compression ratio, promising massive cost savings.

Ongoing Innovations and Strategy

Beyond these migrations, Uber is exploring new frontiers, including:
- High Cardinality Metrics: Using Pinot’s indexing to support millions of unique entities, such as tracking order drops across millions of individual merchants
- HTAP (Hybrid Transactional and Analytical Processing): Integrating with primary storage like Cassandra or MySQL to allow users to run both transactional and analytical queries seamlessly.
- Commitment to Open Source: Uber prioritizes technologies with active communities and merit-based governance, ensuring they can contribute back to the tools they depend on.
Key Takeaways
Managing Uber’s old data landscape was like a large international airport where every airline had its own private security, baggage claim, and air traffic control tower. It was functional but expensive and confusing to navigate.
Consolidating onto Pinot is like building a single, high-tech terminal where all flights share the same efficient infrastructure. It simplifies the journey for the passengers (the users) and makes the entire operation much easier for the ground crew (the engineers) to maintain and scale.
At StarTree we work with many organizations like Uber, who are also consolidating analytics workloads onto Apache Pinot for lower costs and better performance. Explore some of these other case studies at Stripe and Together.ai. Or Book a meeting with one of our solution experts to get advice on how to put Pinot to work for you.
