Resources

Blog

Disaggregating Observability with Apache Kafka, StarTree Cloud, and Grafana

Neha Pawar

Head of Data Infra

released on

January 14, 2025

READ TIME

14 min

Introduction

Recently, I published a blog titled Reimagining Observability: The Case for a Disaggregated Observability Stack, where I broke down the core layers of the observability (o11y) stack — agents, collection, storage, query, and visualization. In that article, I highlighted common challenges with traditional all-in-one solutions, including inflexibility and high costs, and why many companies are shifting to a disaggregated stack.

In this follow-up, I’ll dive into how Apache Pinot strengthens the disaggregated o11y stack by addressing challenges in ingestion, storage, query, and visualization. I’ll also explore key features unique to StarTree Cloud that further enhance the solution.

A quick recap of the disaggregated observability stack

Before diving into the capabilities of Apache Pinot, let’s briefly revisit the limitations of all-in-one o11y solutions and how a disaggregated stack overcomes them.

Most o11y solutions today are bundled as all-in-one offerings, which often result in high costs and limited flexibility. Data is locked within the vendor’s stack, making it challenging to repurpose for other needs. Vendors invest heavily in proprietary agents tailored to their specific formats, which drives up overall solution costs. Moreover, companies with unique data governance requirements may find it difficult to integrate these proprietary agents without significant customization. All-in-one stacks also require agents to transfer large volumes of o11y data from customer accounts to the vendor’s account, leading to high egress costs.

A disaggregated stack, on the other hand, separates the o11y layers — agents, collection, storage, query, and visualization — allowing you to choose specialized systems that excel in each layer rather than relying on a single vendor for an end-to-end solution. This flexibility enables a more tailored and efficient approach, resulting in improved cost-effectiveness and adaptability.

Dissaggregated observability stack versus an all-or-nothing stack

For example, let’s look at some technologies that you could leverage with a disaggregated stack and what advantages they provide:

Agents: With standards like OpenTelemetry, agents have become commoditized, making it easy to send data to multiple backends. This standardization eliminates the need for vendor-specific agents and provides greater flexibility.
Collection: For data collection, streaming systems like Apache Kafka and RedPanda are widely used, often already embedded within the data ecosystem. These systems are designed to handle high-throughput, real-time ingestion at scale, and they’re format-agnostic, seamlessly integrating with OpenTelemetry and similar standards. They also provide robust connector ecosystems and native integrations with various storage options.
Storage & Query: For storage and query, specialized systems — such as real-time OLAP stores like Apache Pinot, ClickHouse or search-engine-based systems like Elasticsearch — offer much more efficient handling of metrics, logs, and trace data. Some of these systems are optimized for high cardinality dimensions and can better support various observability data types with advanced indexing and encoding techniques.
Visualization: Tools like Grafana are increasingly popular for their user-friendly dashboards and customization options. Grafana’s flexible, pluggable connectors allow querying with the native language of the chosen storage layer and support multiple query formats, including PromQL and LogQL, enabling seamless integration with different backends.

Challenges in the storage & query layer of observability

The storage and query layer is one of the most challenging components of the observability stack, directly impacting the cost, flexibility, and performance of the entire system.

Cost: Monitoring distributed architectures with thousands of applications and microservices, can generate daily data footprints of hundreds of terabytes. The storage system must manage vast data volumes flowing in at high velocity, which necessitates a high-capacity infrastructure and drives up costs.
Flexibility: Diverse data formats and content complicate ingestion and querying, making scalability and flexibility essential.
Performance: Observability systems must deliver real-time data and fast queries to monitor health, troubleshoot incidents, and ensure reliability. Despite this, since o11y data is queried infrequently, there is limited willingness to invest in the cost of the infrastructure. Balancing stringent query and freshness SLAs with low infrastructure costs is essential.

Let’s explore how these challenges play a part in managing metrics, logs, and trace data.

Challenges of metrics data

Here’s an example of a typical metrics event. It contains a timestamp column representing the timestamp of the event in milliseconds granularity, a metric name and value column representing the metrics emitted by your system, and a labels column.

There are two things peculiar about such metrics events. Firstly, each row contains only a single metric. And secondly, the dimensions are bundled into the labels column as a JSON map containing a mixed bag of attribute key-value pairs — such as server IP, Kubernetes version, table name, and container ID.

This structure creates complexity for ingestion and querying:

If you ingest the data as-is, querying becomes cumbersome due to extensive JSON extraction requirements.
Alternatively, you could materialize each dimension key, but this approach brings its own issues: keys are often highly dynamic and not predetermined, making materializing them impractical. Additionally, sparse keys — those that appear inconsistently across your data — can lead to excessive storage consumption and cost.
Adding to these challenges, all metrics are typically emitted every few seconds. This means that any metric of interest is going to be distributed across all shards of data. This results in a high query fan-out, requiring the system to scan nearly all data shards to retrieve specific metric data efficiently, which can strain both storage and query performance.

Challenges of logs data

Here’s an example of a log event, that includes a timestamp, along with various top-level attributes such as thread name, log level, and class name, followed by a large, unstructured text payload — the actual log line.

Handling this data presents several challenges:

High volume and long retention requirements: Log data volumes can be massive, and long retention periods are often needed, significantly driving up storage costs.
Complex querying needs: Since logs are largely unstructured, querying involves free-form text search, which can be resource-intensive. Filtering by specific attributes or performing aggregations further complicates queries, impacting speed and efficiency. Low-latency queries are critical for applications that rely on logs to detect anomalous behavior in real-time. For instance, Uber’s Healthline app processes crash, error, and exception logs from internal systems to detect mobile app crashes in real-time.

These factors make effective storage and query strategies essential for managing log data in a cost-effective, high-performance observability stack.

Challenges of trace data

Here’s an example of a trace event. This contains a call graph, consisting of an array of spans, and attributes associated with each span. Once again, given the semi-structured nested nature of the payload, similar challenges arise with respect to storing these cost-effectively, and querying them without complexity.

To summarize the challenges, we need a system that can:

Ingest diverse data formats at high velocity, ensuring seamless integration with various data sources.
Store petabytes of data cost-effectively, with efficient handling of long retention periods.
Query data with high freshness and low latency, supporting complex query patterns without sacrificing performance.

Apache Pinot as the storage & query layer

Apache Pinot offers a robust solution to the challenges of observability at scale. Pinot already addresses many of the core challenges of high-velocity, high-volume data in real-time external analytics and has been proven at scale. Below, we outline key capabilities that make Pinot particularly well-suited to handle observability requirements effectively.

Integration with real-time streaming sources

Pinot integrates very well with real-time streaming sources such as Apache Kafka, RedPanda, and Kinesis. Many companies already leverage Pinot for high-volume real-time analytics — with some deployments, such as those at LinkedIn and Uber, processing millions of events per second.

Pinot is also a highly pluggable system. Support for custom features such as decoders for special formats observed in o11y data (like Prometheus or OTEL), was added with minimal effort. This flexibility is crucial for observability, where support for diverse data formats from different agents is essential.

Efficient, fast storage and query engine

Now, let’s discuss the actual storage engine of Pinot. As discussed, observability data contains various data types and columns, each with its own complexities. Here’s our blueprint for how Pinot can effectively address the unique storage and query requirements of each data type in observability data.

Optimized indexing for observability queries

Indexing is one of Pinot’s standout strengths, and several indexing techniques make it especially effective for observability workloads:

Inverted Index: This index maintains mappings from values to document IDs, making it ideal for fast filtering on attributes like metric name, log level, and class name.
Sorted Index: By sorting data based on a frequently used column in filter predicates (such as metric name), this index increases data locality, reducing scan times and improving query performance.
Range Index: An advanced version of the inverted index, the range index stores ranges of values to document IDs, enabling quick filtering for range-based queries like those on timestamps or metric values.
Timestamp Index: A specialized form of range indexing, the timestamp index materializes and indexes multiple timestamp granularities, minimizing on-the-fly computations and speeding up time-based queries.
JSON Index: The JSON index optimizes filtering and lookup on JSON columns by eliminating the need for expensive scans and JSON object reconstruction. Such techniques are much needed for efficiently handling trace data.
Text Index: Text index helps with free text search on unstructured text blobs. This is an effective index for handling term, phrase, regex match queries on logs, such as finding all logs with ERROR or filtering all logs for a particular server or application.
StarTree Index: The StarTree index is both a filtering and aggregation optimization that helps accelerate aggregations involving point lookups across multiple arbitrary dimensions. This is a great technique when filtering on multiple attributes of metrics data to calculate metric aggregations.

You can read more about these indexes and the performance improvements they bring, in our blog What Makes Pinot Fast.

Partitioning for optimized query performance

In Pinot you have the ability to partition the data by space as well as time. At query time, we can prune out entire server groups and segments using this partitioning scheme. Such techniques help reduce the amount of work done at each step, thus helping the query performance.

This technique is especially useful in handling metrics data, as every query will necessarily have a metric name filter, so having segments partitioned by metric name will help reduce the query fanout drastically.

Efficient log handling with CLP encoding

For handling log data, Pinot has adopted the Compressed Log Processor (CLP) encoding technique. CLP is a compressor designed to encode unstructured log messages in a way that makes them more compressible while retaining the ability to search them. The CLP algorithm will first tokenize the phrase to extract the values, and then generate a template.

In this example below, you can see this in action on a log line from Pinot broker logs, where the values [pinot-broker-7001, foo_OFFLINE] and [20, 72] were extracted out and the template generated was “Broker \x11 took \x12 ms to execute requestId \x12 on table \x11”. The same process repeats when querying — the search phrase is tokenized to extract values and the template. With these tables ready, it becomes a predicate matching problem in Pinot.

CLP technique has shown some amazing improvements in the compression ratio. This is mainly attributed to the fact that even as log data keeps increasing in volume, the number of templates tends to remain constant. Here’s a table demonstrating this, on sample pinot broker logs which we converted to a Pinot segment with CLP encoding. Even though data increased 15 times, the templates only grew 1.5 times.

A benchmark published by Uber when they applied this on Spark logs showed a dramatic 169x compression factor, compared to raw logs.

To use CLP encoding, you simply need to set the compression codec when defining the column in the Pinot table, and all this processing happens transparently under the hood. You can also configure the column to have text indexes in Pinot, and Pinot will seamlessly utilize it for the regex match on the CLP structures.

Trace data handling with JSON support

For trace data, Pinot has support for the JSON data type and JSON index. Without such techniques, you’d need to parse the JSON and extract tokens for each row, quickly increasing costs. However, with the JSON index, every field within your JSON gets indexed. This means that searching for a deeply nested field becomes much more efficient, as it operates like an index lookup.

StarTree Cloud as the storage & query layer

Now let’s take a look at what additional capabilities are offered by StarTree Cloud to support o11y data. StarTree Cloud is a cloud-native, fully-managed platform powered by Apache Pinot.

BYOC

The most important piece of the StarTree solution is the “Bring Your Own Cloud” (BYOC) deployment. One of the main reasons traditional vendor solutions are costly is the high data egress fees incurred when transferring data from your account to the vendor’s infrastructure. With BYOC, this issue is eliminated. The entire observability stack, including the agents, stays within your account, ensuring that your data remains within your premises. This not only gives you more control over your data but also helps you avoid the additional costs associated with data transfer, making the solution far more cost-effective.

Efficient handling of attribute maps with MAP data type

To manage attribute maps (eg. labels / dimensions in metrics data), we introduced the MAP data type in StarTree Cloud, which streamlines the ingestion and querying of key-value pairs, while reducing query complexity.

For ingestion, simply set the map field to the MAP data type, and StarTree Cloud will handle the processing automatically. Querying the map is as straightforward as performing a lookup by key.

In terms of storage, the MAP data type offers two formats for storing keys: dense and sparse. You can manually configure which keys to store as dense or sparse based on the dataset’s characteristics, or allow Pinot to infer this dynamically. Dense columns are represented similarly to fully materialized columns, providing performance benefits akin to materializing the key as an independent column. In contrast, sparse columns are stored in an EAV (Entity-Attribute-Value) format, which is far more storage-efficient than storing each key as a separate column.

Here’s a comparison of storage footprints: storing all columns as dense (red) significantly increases storage costs but boosts performance. Sparse columns (blue) are more storage-efficient but come with a slight performance trade-off. The MAP data type (yellow) offers a flexible approach, allowing you to designate frequently queried columns as dense while keeping others sparse. This enables you to fine-tune the trade-off between storage efficiency and query performance, making it an optimal solution for observability use cases.

Optimizing storage costs with cloud tiered storage

One of the biggest challenges with current observability solutions is the high cost of storing and querying the extremely large volumes of data they generate. In StarTree Cloud, we address this by configuring Pinot to use multiple storage tiers, such as SSDs or disks for recent data and cloud object storage for older data, which helps reduce storage costs. This is particularly important for observability data, as you typically need to retain it for long periods, even though queries beyond the most recent few days or weeks are infrequent. Using more cost-effective storage for older data becomes crucial in managing these costs effectively.

We have implemented several key performance optimizations to ensure sub-second latency, even when data has moved to cloud object storage. These techniques include the ability to pin parts of the data, such as indexes, locally, and employing block-level fetching with pipelining of fetch and execution.

Seamless integration with Grafana

On the visualization front, we have introduced first-class support for Pinot and Grafana integration. We have developed a Grafana connector for Pinot, enabling you to query Pinot using a builder interface. Additionally, we are enhancing the integration by enabling direct querying of Pinot using widely used o11y query formats like PromQL and LogQL. This support provides a seamless experience for users already familiar with these query formats.

You may also read about how Cisco moved from Elastic Search to Apache Pinot to manage > 100TB telemetry data and to support Grafana and Kibana visualizations here.

Grafana-Pinot connector for querying Apache Pinot using PromQL via Grafana

Figure: Grafana-Pinot connector for querying Pinot using PromQL via Grafana

Conclusion

The challenges of inflexibility, rising costs, and data governance in traditional all-in-one solutions have driven the need for a disaggregated observability stack—one that allows you to tailor the best tools for each layer of the stack. With Apache Pinot as the backbone for the storage and query layer, and the enhancements provided by StarTree Cloud, we’ve shown how this approach addresses the demands of high-velocity, high-volume observability data while maintaining performance and cost efficiency.

Want to see how StarTree Cloud would benefit your observability workloads? Get started immediately in your own fully-managed serverless environment with StarTree Cloud Free Tier. If you have specific needs or want to speak with a member of our team, you can also book a demo.

Get started free

Book a demo