Empowering Real-Time Security Visibility at Slack with Apache Pinot

In the 2025 Pinot Year in Review, engineers at Slack shared how Apache Pinot has enabled them to build external user-facing real-time analytics features to support customer security and monitor data exfiltration in real-time.

<1s

Ingestion Latency

100%

Data Accuracy

Against Iceberg Tables (Downstream)

<10s

Query Latency SLA

For over three years, Slack has utilized Apache Pinot to power customer-facing dashboards and analytics. Historically, these analytics relied on batch processing, where data flowed through Spark into S3 and was ingested into Pinot offline servers. This batch-based approach resulted in a data delay of 24 to 48 hours.

Challenge

Slack identified a significant visibility gap for their enterprise customers regarding data security. Customers lacked insight into how much of their data—specifically, the volume of messages and files—was being exported from their Slack instance and consumed by external Slack apps.

Because this pertained to security monitoring, the existing batch-processing latency of one to two days was unacceptable. Customers needed the ability to react to data consumption as it occurred, rather than discovering it after the fact.

Solution

In Q3 of 2025, Slack launched its first large-scale, external user-facing real-time analytics feature to solve this problem. By upgrading their infrastructure, they transitioned from a purely batch-based system to a robust real-time architecture.

Real-Time Ingestion Pipeline: Instead of routing data through the overnight Spark/S3 batch process, the new solution ingests data directly into Kafka services. Pinot real-time servers then consume this data directly from Kafka.
Managing High Compute at Scale: The system ingests 400 to 500 million records into Pinot in real time. To calculate the distinct number of messages and files consumed by external apps, Slack required highly compute-intensive aggregations on multi-value fields and arrays.
Algorithmic and Infrastructure Optimization: To handle these heavy distinct count queries at scale, Slack utilized Pinot’s native HyperLogLog (HLL) functions (distinct count HLL and distinct count HLL MV) to provide approximate counts. They further tuned their infrastructure by adding range, sorted, and inverted indexes.

Results and Impact

Dramatically Reduced Latency: The shift to the Kafka-Pinot pipeline reduced data landing latency from roughly one day to less than one second. If an external app requests messages, the event appears in Pinot almost instantly.
High-Speed Querying: Despite the massive data volume and compute-intensive aggregations, Slack achieves a query latency of less than 10 seconds.
Enhanced Customer Security: Enterprise customers now have the immediate visibility they need to monitor data exfiltration and act upon suspicious app behavior in real time.

Future Outlook

Having successfully established this real-time foundation, Slack is highly confident in Pinot’s capabilities. A major focus for 2026 is to heavily expand their real-time analytics use cases. To achieve this, Slack is currently onboarding Apache Flink, positioning a combined stack of Kafka, Flink, and Pinot to solve the majority of their future real-time analytics needs.

Explore Real-time Analytics with Apache Pinot

The best way to experience the capabilities of Apache Pinot is to try it yourself with StarTree Cloud. Request a Trial to get started with a no-commitment trial account to explore and test. When you’re ready to move into production, you can move into one of our cost-effective packages just right for your business.

Request a Trial

Contents

O'Reilly eBook

Rethink Observability on Open Source

This future-ready guide for technology leaders provides guidance on how to build your own open-source observability platform for a strategic, flexible advantage.

Get your copy

Edit Promo