Apache Pinot & Real-Time Ad Analytics By UberEats
UberEats was one of the earlier adopters of Apache Pinot, and the company was able to leverage it across several interesting use cases. Here’s one example of how UberEats relied on Apache Pinot (along with other open sources technologies) to deliver accurate, real-time ad analytics at scale, and reliably.
UberEats and Real-Time Analytics
To understand this use case, it is important to understand UberEats’s model. UberEats presents a service to end users who wish to order food from local restaurants. As part of their strategy, UberEats began serving ads on their platform in 2019, allowing restaurants to appeal to hungry end users based on different factors such as location, time of day, search and order history.
Naturally, both Uber and restaurant partners/merchants want analytics on ad performance (impressions, clicks, conversions) based on ad events so that they can be compared to offers, actual purchase behavior, and ad attribution. Additionally, the financial dashboard needed to report gross bookings with corrected ride fares, and restaurant owners required analysis of UberEats orders with their latest delivery status.
The key requirements of the above use cases from an underlying analytics platform are –
- Ability to ingest huge volume of processed ad event data
- Low ingestion latency to offer fresh data for queries
- High QPS support and low query latency to support several external users
- Be able to incorporate the latest updates to real-time data
Traditional Data Warehousing versus OLAP Architecture
Existing solutions such as online transactional processing (OLTP) databases can serve accurate queries but will not scale for analytical queries at high QPS. Similarly, traditional data warehousing solutions are not built to serve high throughput analytical queries that need low latency responses. This blog covers the requirements in detail for a realtime online analytical platform (OLAP) across several use cases.
A lot has to go on behind the scenes to make this work. Architecturally, an ad event processing system needs to be in place to manage the flow of these events. That includes cleansing the data gathered, aggregating it, attributing them to orders, and putting the data into an accessible format for reports.
Apache Pinot was used to process the analytical queries to this system because it could provide Uber and UberEats restaurant partners/merchants with accurate, real-time ad analytics at scale and reliably.
How Apache Pinot Solved UberEats Use Case
Specifically, for this use case, Pinot was able to support the needs for a user (merchant) facing analytics dashboard (as captured above in the requirements section) by delivering data freshness, high throughput, and low latency query processing. Pinot’s upsert feature enabled integration of changelog from Apache Kafka for updating existing data in Pinot and delivering an accurate view in the real-time analytical results. Thanks to code contributions from Uber’s Data Infrastructure team, providing native support of upsert during the real-time ingestion process greatly enhanced the data manipulation capability in Pinot, and enabled a large category of use cases.
Currently, the ad event processing system at UberEats is processing hundreds of millions of ad events per week, and this grows more every day. For those interested in the technical details of the system, UberEats engineering blog has a great in-depth look at the system and the challenges it had to overcome: Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot. Also, summarized here by InfoQ.
We’d Love to Hear From You
Here are some ways to get in touch with us!
- Join our Slack channel: https://communityinviter.com/apps/apache-pinot/apache-pinot
- See our upcoming events: https://www.meetup.com/apache-pinot
- Follow us on Twitter: https://twitter.com/startreedata
- Subscribe to our YouTube channel: https://www.youtube.com/startreedata