Apache Pinot – a scalable alternative to ElasticSearch for log analytics

Apache Pinot™ and Elasticsearch look like they were built for entirely different purposes—because they were. But as log data has evolved from something you search to something you analyze, these two systems now find themselves overlapping on the same critical workload: Log analytics.

Written By

Chad Meley

SVP, DevRel

Published

December 5, 2025

Reading Time

Elasticsearch originated as a search engine. Its strength lies in indexing and retrieving full-text documents, making it incredibly powerful for use cases like product search, log search, and website indexing. Apache Pinot, on the other hand is designed for real-time analytics. It was architected to answer complex, high-speed aggregation queries across massive datasets with consistently low latency.

So how did these two very different systems both end up making the short list as a viable solution for this workload? The answer lies in how modern organizations are trying to extract more value from their logs.

Traditionally, logs were searched—when did this error happen? What user triggered this exception? Elasticsearch was perfect for that. But the nature of questions has evolved. Today, teams want to analyze logs—spot patterns, aggregate across time windows, slice across multiple dimensions like region, app version, or device type, and do it all in real time.

If your goal is to search logs, Elasticsearch remains a solid choice. But if your goal is to analyze them—especially at scale—Apache Pinot is likely the better tool for the job.

In the sections that follow, we’ll walk through why, looking at:

What it takes to get from unstructured logs to structured analytics
How Pinot’s architecture supports real-time performance and up-to-date insights
Why companies like Uber, Cisco, and Uniqode rely on Pinot for these use cases

Columnar vs Document Architecture for Log Analytics

Logs are messy. They often start as text files, semi-structured JSON, or key-value pairs. But logs contain valuable structure. Each entry typically includes elements like:

Timestamps
IP addresses
Status codes
User identifiers
Error messages
Request parameters

While Elasticsearch treats these as parts of a single document to be searched through, Pinot treats each as a distinct column to be analyzed. This fundamental difference changes everything about how logs can be processed and queried.

Column Storage

Slash Compute Costs:
Only relevant columns loaded for analytics.

Reduce Storage Costs:
5-10x better compression ratios due to dictionary encoding.

Document Storage

Excessive compute costs: Full document scan required for analytics.

Bloated storage costs: Row-oriented design has poor compression, duplicating repeated

When log data is stored in columns rather than documents, two critical advantages emerge:

Dramatic compression and reduced storage requirements Similar values in each column compress efficiently, reducing storage needs by 60-80% compared to document-based storage.
Faster analytics performance Queries can access only the specific columns they need instead of scanning entire documents, enabling aggregate queries that are 5-150x faster than in Elasticsearch.

Most analytical use cases benefit greatly from structure—you don’t just want to search for a string, you want to analyze patterns, distributions, time windows and anomalies. When logs are decomposed into structured columns: Timestamp, error code, etc., each column can be individually indexed and optimized for fast, concurrent queries.

That leads us to one of Pinot’s superpowers: Flexible indexing.

Indexing for Analytics vs Search

When it comes to analyzing logs, Elastic and Apache Pinot take fundamentally different approaches to indexing that reflect their design priorities.

Elasticsearch was built for text search, not analytics. It uses inverted indexes optimized for keyword and full-text matching, which excel at finding documents that contain specific terms but perform poorly for aggregations, time-series analysis, or high-cardinality metrics. Because it’s row-oriented, each log entry is stored as a separate document, not only inflating storage but also slowing down group-bys and aggregations.

Apache Pinot, by contrast, was designed as a columnar OLAP engine optimized for analytical queries. It employs forward indexes, sorted indexes, and specialized data structures such as star-trees, range indexes, and Bloom filters to accelerate scans, aggregations, and time-based filters over massive datasets. The result is much faster query performance on metrics and trends with far lower storage overhead. The trade-off: Pinot isn’t ideal for free-form text search or fuzzy matching — it’s a purpose-built engine for sub-second analytics on structured and semi-structured log data, not a search engine for raw log retrieval.

One of the key advantages of using Pinot for log analytics is its ability to apply indexes based on each column’s data type, optimizing for fast, concurrent queries.

Inverted index: For filtering on categorical fields
Range index: For efficient numerical filtering and rollups
Sorted index: For faster group-bys and top-K queries
H3 Geo index: For location-based queries
JSON index: For semi-structured log content
Text index: For regex and partial matches
Star-tree index: Pinot’s unique optimization that pre-aggregates common query patterns, reducing scan costs by over 90%.

You can mix and match these indexes per column, depending on what kind of queries you’re running. This is a massive performance advantage if you’re analyzing logs, not just searching them.

This selective, purpose-built indexing approach in Pinot provides higher performance and flexibility for analytical workloads—especially at scale.

Cold Storage of logs without the slowdown

As log data volumes grow, so do storage costs—especially when historical data needs to remain queryable. Most systems force a tradeoff: Either pay a premium for high-performance storage or accept painfully slow queries on cheaper, colder tiers. This is especially true for Elasticsearch, where accessing older data often means rehydrating entire segments just to answer a single question.

But Pinot, via StarTree Cloud, takes a smarter approach. Its precise fetching feature not only maintains fast query performance over cloud object storage data (S3 / GCS / Azure Blob Storage)—it also significantly reduces infrastructure costs by avoiding unnecessary data scans and movement. Here’s how:

Scan only what you need fetch specific columns, not entire rows, slashing I/O and speeding up performance.
Target the right blocks block-level reads only target the data that matches your query, eliminating unnecessary overhead.
Pipeline execution begins fetching data during query planning, pipelining I/O and execution in parallel. This overlap slashes query latency—often by 5x or more.
Smarter query paths with index pinning and pruning, skip irrelevant segments entirely using metadata like min/max values and bloom filters. It keeps critical index structures hot and focuses only on the most relevant data blocks.

Precise fetching in StarTree’s tiered storage lets organizations economically store massive log datasets while maintaining interactive query performance—regardless of data age or storage tier. For observability teams, this means they can access and analyze months or years of historical logs with sub-second latency, enabling more powerful pattern matching over longer time horizons. The result: Faster root cause analysis, improved anomaly detection, and fewer blind spots in system health.

Discover issues immediately: data freshness at scale

Because Pinot is built to ingest from streams like Kafka and make data available for querying within seconds, you can act on what just happened—not what happened an hour ago. This is especially important in logs where fast anomaly detection is essential: Mobile crash monitoring, security incident response, system observability.

Pinot doesn’t just keep up with the stream. It handles scale. More than 10 million events per second from Kafka, without throwing hardware at the problem.

In contrast, Elasticsearch clusters tend to balloon as data grows. This increases cost and complexity. Query performance degrades and tuning becomes a never-ending project.

Since migrating from Elasticsearch to StarTree, system reliability has dramatically improved with far fewer alerts, scalability bottlenecks have been resolved, and we’ve reduced infrastructure costs by 70%—saving over $2M annually—while cutting CPU cores by 80%.
Uber Healthline

90% Reduction in Infrastructure, 10X Faster Queries: Uber Healthline

Uber’s Healthline system ingests more than 1,500 mobile crash reports per second and classifies them into buckets called issues. That’s 36TB of log data per day, retained for 45 days. These logs need to be:

Parsed
Compressed
Flattened
Sampled
Queried with low latency

The queries range from filtering by region, app version, and time range, to generating histograms and dashboards for release managers.

Elasticsearch couldn’t keep up. Queries would time out. Clusters grew unmanageable. Once the team migrated to Pinot, they saw:

10x faster queries
90% reduction in infrastructure (7x less memory, 10x fewer cores)
Sub-second latencies across large time ranges
Zero query timeouts, even at peak

Pinot gave them the speed and insight they needed to confidently roll out app updates and catch regressions in near real-time.

Summary

While Elasticsearch remains an excellent choice for search-oriented applications, organizations analyzing logs for patterns, trends, and metrics consistently find better performance, lower costs, and reduced operational complexity with Apache Pinot.

Storage Efficiency Pinot typically reduces log storage requirements by 5-10x compared to Elasticsearch through better compression of similar values
Query Performance Analytics queries only need to access and process relevant columns, resulting in 5-150x faster query performance compared to Elasticsearch’s full document scanning
Scalability Companies like Cisco Webex and Uber have reduced their cluster sizes by 80-90% when switching from Elasticsearch to Apache Pinot for log analytics.
Cost Savings The combined benefits of reduced storage needs and smaller cluster sizes translate to significant infrastructure cost savings

You don’t need to choose one or the other. Many organizations use both—Elasticsearch for searching logs, Pinot for analyzing logs.

Next steps: Explore Apache Pinot for Log Analytics

The quickest way to experience the power of Apache Pinot for log analytics is with StarTree Cloud. Book a Demo or Request a Trial to get started with a no-commitment trial account to explore and test. When you’re ready to move into production, you can move into one of our cost-effective packages just right for your business.

Contents

Just Released

Guide to Real-Time Analytics at Scale

Leading organizations like Uber and Stripe are harnessing real-time insights to efficiently power customer-facing data products. Are you ready to do the same?

Download your copy

Edit Promo