User Story
60px Moloco Logo Dark

Moloco: Apache Pinot and StarTree Smooth AdTech Company's Journey to Real-Time Ad Analytics

Moloco, an AdTech company that uses machine learning, uses Apache Pinot and StarTree to reduce query latency from 10 minutes to milliseconds.

Query latency
Milliseconds
Apache Pinot tables
100+
Data ingested daily
10 PB

Summary

  • Moloco processes 10 petabytes of ad data daily and averages 6 million queries per second.
  • Moloco reduced latency on analytics queries from several minutes (up to 10 minutes for complex queries) to milliseconds with indexes in StarTree and Apache Pinot.
  • StarTree helped reduce response times on queries used in A/B testing from five minutes to less than two seconds.
  • A command-line interface developed by an intern reduced manual setup times and enabled Moloco to scale Pinot to over 100 tables across 100 terabytes of data.
Moloco reduced query latency to milliseconds with StarTree and Apache Pinot

Moloco moves to StarTree and Apache Pinot for real-time ad analytics

In the high-speed world of advertising technology, time is money. Ad placement decisions are made in real-time as users interact with websites or apps, with automated purchasing decisions typically made in less than a second. Programmatic advertising gives ad buyers greater control over placement and frequency than non-programmatic methods, which is why E-Marketer expects it will account for over 90% of digital display ad spending this year. Global programmatic ad spending is expected to total $595 billion this year, rising to $800 billion by 2028.

Milliseconds can spell the difference between revenue and a lost customer in such a high-speed environment. Redwood City, CA-based Moloco offers programmatic advertising solutions fueled by machine learning (ML) to help optimize its clients’ acquisition, retention, and monetization campaigns. Its demand-side platform, responsible for handling over $1 billion in annual ad spending, enables clients to automate digital advertising purchases across multiple channels, optimizing ad placements in real time to enhance user acquisition and engagement.

The company processes about 10 petabytes of data daily and fields over 6 million queries per second. Moloco has adopted Apache Pinot and StarTree to anchor its real-time infrastructure as it has evolved from batch processing to real-time ad analytics.

Moloco ingests 10 petabytes of data daily, with 6 million-plus QPS

Data cardinality and complexity challenges led to 10-minute processing times

The move to real-time ad analytics brought several major challenges. The company’s internal analytics dashboards, particularly its performance analytics portal, faced significant delays due to high cardinality and complex queries that took up to 10 minutes to process. The frequency of data access resulted in major productivity hits, said Hyun Min Choi, a Moloco software engineer, in a presentation at the 2024 Real-Time Analytics Summit. Fast, flexible data exploration was further hindered as dimensions were added, increasing data size and query latency exponentially.

“The [Moloco] portal is very important to us because it answers questions such as how much revenue is being generated right now, the number of clicks per campaign, and how many installs we are getting for each country,” Choi said.

Moloco’s in-house query processing framework, built on Google’s Bigtable NoSQL database service, achieved adequate performance on queries covering one or two dimensions. However, as the need for multidimensional data grew, Bigtable struggled with queries that often involved 40- to 50-dimension columns and high data cardinality.

Traditional data analytics infrastructure at AdTech company Moloco

Moloco’s A/B testing platform, crucial for assessing the performance of different advertising messages and audience targeting choices, required multiple layers of complex joins between the fact tables that store quantitative data and the dimension tables that provide context. Moloco analysts “might look at the dashboard 100 times a day,” Choi said. “If you think about 10-minute queries 100 times a day, that took a toll on our data analytics productivity.” It also impacted data analysts’ ability to assess metrics changes and revenue insights promptly.

Handling the vast data volumes needed for real-time analytics also necessitated a scalable, idempotent ingestion solution. Idempotence ensures that performing an operation multiple times produces the same result as doing it once. Processing logs into analytics tables and syncing them without duplication or data loss was essential for maintaining accurate and timely insights.

Reducing query latency to milliseconds with Pinot and StarTree

Moloco implemented Apache Pinot with StarTree indexes to tackle these issues, integrating Pinot with its Google Cloud-based infrastructure.

The introduction of StarTree indexes transformed the performance of Moloco’s data analytics, reducing query latency from several minutes to milliseconds. StarTree indexes allowed Moloco to pre-aggregate data across specified dimensions and metrics, which was crucial given the high cardinality of their datasets.

Pinot enabled fast and efficient joins, supporting up to seven-layer joins necessary for A/B testing. By creating a large table with all required dimensions and metrics and optimizing for specific query patterns, Moloco achieved fast performance and flexibility without compromising existing use cases.

StarTree indexes are Pinot indexes that abstract and aggregate data. “You decide the dimension columns and metrics columns to use, and when you run aggregation queries with any subset of dimension columns and metrics columns, you get excellent core performance because the aggregation results have already been stored in the star-tree index,” Choi said. Queries against the A/B testing platforms went from latencies of up to five minutes to less than two seconds.

Scaling Pinot usage to 100+ tables

Moloco incrementally integrated Pinot as an extension of its existing data architecture. By starting with a few core use cases and expanding gradually, data engineers could minimize operational disruption and transition gradually to real-time ad analytics.

Moloco's data analytics infrastructure with Apache Pinot and StarTree

StarTree’s SqlConnectorBatchPushTask, a configuration option within Apache Pinot’s Minion framework that facilitates batch data ingestion from SQL-based data sources into Pinot tables, was a critical asset. Moloco established an idempotent ingestion process from BigQuery and other SQL data sources to enable precise data synchronization without duplication. File ingestion tasks synchronized dimension tables from Google Cloud Storage, further optimizing data flow into Pinot.

An internal command line interface developed by an intern enabled data engineers to create, delete and update Pinot tables on demand, supporting rapid deployment across new use cases. The CLI significantly reduced manual setup times.

“Without the CLI tool we would have spent a lot of time clicking buttons on the browser and not been able to automate any of our use cases,” Choi said.

By leveraging StarTree indexes and integration capabilities, Moloco reduced query latency from 10 minutes to under two seconds, delivering substantial productivity gains and enabling real-time insights. The A/B testing platform now delivers insights in milliseconds, supporting fast, data-informed decisions.

Moloco now has about 100 tables in its cluster, totaling 100 terabytes of data. Just two people — Choi and an intern — initiated the project, and adoption grew over several months.

Moloco has 100+ tables in their Pinot cluster and 100 terabytes of data

“Keeping things simple and incremental helped immensely in impact,” Choi said. “We considered Pinot a natural extension of our existing data infrastructure. The project has been wildly successful, and a lot of people at Moloco now want to use Pinot.”

Watch Moloco’s talk

View the full talk from Moloco from Real-Time Analytics Summit 2024 about how they use Apache Pinot and StarTree to power AdTech analytics dashboards.

Discover StarTree Cloud

If this case study sparks ideas for your own data architecture and use cases, you’re encouraged to learn more about StarTree Cloud, its advantages over open source Apache Pinot. If you already have heard enough, feel free to schedule a demo or sign up for a 30-day free trial.

Ready to deploy real-time analytics?

Start for free or book a demo with our team.