Uber Rewrites the Rules of Observability with Pinot Time-Series Engine

Session title: Time Series Query Engine for Apache Pinot

Modern observability systems are expected to offer fast, real-time insights into application health and performance. However, traditional SQL-based approaches can buckle under the complexity of time series analysis, especially when dealing with missing data, changing time resolutions, and high user concurrency.

At Uber, these issues became significant enough to warrant a foundational change in handling time series queries. The ride-hailing giant built a custom time series query engine based on Apache Pinot that is already powering over 100,000 alerts in production. Uber Software Engineer Ankit Sultana described the development team’s thinking in a presentation at the Real-Time Analytics Summit hosted by StarTree.

SQL’s observability shortcomings

Uber had long used Apache Pinot for real-time analytics due to its millisecond-latency queries and seamless ingestion from batch and streaming sources. One of Pinot’s most popular features is its support for SQL. But when developers began using Pinot for observability, particularly for charting metrics like query volume or service latency, SQL showed its limitations.

Creating time-based graphs in tools like Grafana required users to manually construct complex SQL queries using functions like DATE_TRUNC, time filters, and GROUP BY clauses to round timestamps into uniform time buckets, narrow queries to a specific window, and aggregate the data by each bucket. Even minor configuration mismatches – such as incrementing time in milliseconds instead of seconds – could trigger errors or return misleading results.

SQL’s LIMIT clause, which improves performance by limiting how much data a query returns, introduced another complication. As users increased time resolution or widened their query ranges, they hit row limits, resulting in incomplete graphs. Missing data points created by ingestion gaps or sparse events caused dashboards to misbehave or fail, particularly during week-over-week comparisons or success rate calculations.

Rather than attempting to patch SQL with macros and post-processing logic, Uber’s real-time analytics team built a dedicated time-series query engine for Pinot. Their goal was to offer a simpler, more intuitive interface for observability while preserving Pinot’s performance and flexibility.

Crucially, the engine supports domain-specific query languages like M3QL and PromQL, which are used in open-source observability stacks. Sultana said the team built an M3QL plug-in for internal use that may eventually be released to open source. That eliminated the need for users to write complex SQL. Queries like moving averages, gap filling, and time-shift comparisons can be expressed in plain, readable syntax that closely matches user intent.

“We built it so that you can run any time-series query language against Pinot,” Sultana said. “The query language is completely pluggable.”

Even better, the engine runs on existing Pinot tables without refactoring data models or migrating schemas. That flexibility helped Uber onboard users quickly and scale up adoption.

Hurdles to clear

Designing the new engine wasn’t without its challenges. One major hurdle was the number of combinations generated when grouping by time and dimensions like tenant or service. Time bucket multiplication could easily blow past SQL row limits, especially over long time windows.

The new engine automatically adjusts time resolution based on the requested time range to resolve this. Short windows, such as the most recent hour, retain high granularity, while longer views, like 30 days, use coarser buckets. These features keep queries performant without user intervention.

Another challenge was the need for gap filling, or inserting zero values where data was missing, so downstream calculations wouldn’t fail. Pinot’s SQL-based GAPFILL function was an option, but was judged too difficult to use. The new engine includes an M3QL keyword that transforms gaps into filled time buckets with one line of code.

“Going from gapful to gapless data is very easy,” Sultana said. “You can transform nulls to 0 or any custom value.”

The team also had to ensure that the moving aggregations that smooth out spiky metric graphs were straightforward to apply without losing resolution. SQL typically requires window functions, which many users find intimidating. The time series engine Uber built simplifies this with built-in moving functions, enabling one-hour rolling averages, percent changes, and other operations with minimal syntax.

Rapid adoption

Uber quickly integrated the time series engine into its observability stack, using it to monitor Pinot clusters, query loads, and application performance metrics. The results have been transformative, with over 100,000 alerts relying on the engine.

Users can now create rich, interactive dashboards without needing to know SQL. Queries deftly handle missing data, scale across wide time ranges, and return results quickly. Users can fall back to SQL for functions such as creating pie charts or ad hoc exploratory queries.

The pluggable design means organizations can bring their preferred time series languages to Pinot, further lowering barriers to adoption. The query execution layer abstracts away language-specific semantics, allowing developers to plug in new operators and planners without modifying core Pinot code.

Although the engine is currently considered experimental, Uber is working toward a stable release. Planned enhancements include support for exemplars for deeper analysis and additional tooling to simplify configuration.

The time series engine has cemented Apache Pinot’s role as a comprehensive observability platform. It supports metrics, logs, events, and traces across batch and streaming data. Pinot’s low-latency performance and real-time ingestion capabilities make it a unified solution for user-facing analytics and operational monitoring.

Open Source Observability with Apache Pinot and StarTree Cloud

Learn more about how you can build an open-source observability platform in this practical guide – O’Reilly: Open Source Observability.

And StarTree Cloud is a quick and way to experiment with the new time-series query engine in Apache Pinot. Book a meeting and we’ll get you started.

Contents

O'Reilly eBook

Rethink Observability on Open Source

This future-ready guide for technology leaders provides guidance on how to build your own open-source observability platform for a strategic, flexible advantage.

Get your copy

Edit Promo