
Using StarTree Cloud to Get the Most Out of Apache Pinot

Overview
Apache Pinot is an open-source, distributed database known for its speed and efficiency in handling large-scale, high-dimensional data. StarTree offers a managed cloud solution for Apache Pinot, with enterprise-grade features to simplify data pipeline management. In this blog, we’ll explore the key advantages of moving to StarTree Cloud, backed by real-world examples. If you already like Apache Pinot, you’ll love using it via StarTree Cloud.
Rather than discussing all the capabilities of Pinot, this blog will focus on the benefits of adopting StarTree Cloud, including practical use cases.
For those unfamiliar with Apache Pinot, this video offers a great introduction:
What is StarTree Cloud?
StarTree Cloud is a fully managed platform for Apache Pinot. Here’s a simplified overview of its additional capabilities, above and beyond Apache Pinot OSS:
- StarTree extensions: Proprietary features that enhance Pinot’s core functionality, making it more cost-effective, faster, and easier to manage. Examples include StarTree Cloud Tiered Storage, which allows you to take advantage of the cost savings from object stores like S3; StarTree Upserts, which let you persist and scale upserts to billions of records per server; and new indexing strategies like Sparse Index to optimize performance.
- StarTree ThirdEye: StarTree’s flagship application for detecting real-time anomalies in time-series data stored in Pinot. It also allows for interactive root cause analysis across multi-dimensional data.
- StarTree Data Manager: A self-service ingestion tool that enables users to fetch data from multiple sources (streaming, batch, SQL), perform data wrangling, and customize advanced Pinot configurations through an intuitive UI.
- Data source connectors: Libraries and connectors for seamless data ingestion and distribution.
- Tooling: Utilities for data migration, advanced cluster operations, performance analysis, and more.
- Security: StarTree Cloud supports custom OIDC-compatible Identity Service Providers (IDP) for user authentication and offers Role-Based Access Control (RBAC) to manage access policies. StarTree also adheres to security standards such as SOC2 compliance and ISO 27001 compliance.
Deployment models
StarTree Cloud offers two main deployment models:
- BYOC (Bring Your Own Cloud): The data remains within your infrastructure (data plane), while StarTree manages the control plane securely on its own infrastructure.
- SaaS (Software as a Service): A fully managed solution where the entire Pinot cluster is hosted on StarTree infrastructure for simplicity and ease of use.
For more details on these models, refer to this blog post.
Other key differentiators between StarTree Cloud and open source Apache Pinot
Beyond the differentiators discussed above, StarTree Cloud offers additional benefits that are often overlooked:
- 24×7 support backed by stringent Service Level Agreements (SLAs): This is essential for mission-critical use cases where downtime is not an option. The Site Reliability Engineering (SRE) team at StarTree actively monitors cluster infrastructure, offering both proactive and reactive support as needed. Numerous alerts are configured on the cluster to prevent and mitigate unplanned issues.
- Fully managed service: The StarTree team tackles all administrative work of cluster and systems management. Software upgrades, backups and restorations, compactions and other tasks are all seamlessly performed for you.
- Dedicated consultancy: Subscribers to StarTree Cloud gain access to expert consultancy from the original developers of Apache Pinot. This service, though not always highlighted, adds immense value, especially for enterprises with complex use cases.
Real-world use cases: Migrating from open source Pinot to StarTree Cloud
Leading payment provider in Asia
Razorpay, a fast-growing payments startup in India, handles massive traffic on its analytics platform, ingesting millions of events per second. Latency is critical to the success of their platform.
"We never had true real-time analytics per se before Apache Pinot. We had makeshift solutions that were based on Elasticsearch and Postgres. But in those systems, the average data freshness was around 15 to 20 minutes, because those [datasets] were still running as batches. With Pinot, we have a real-time stack based on Spark Streaming and Pinot."

While their open-source Apache Pinot cluster was functioning well, Razorpay engaged with StarTree to optimize performance further. Key improvements included:
- Infrastructure optimization: Razorpay used upserts heavily, which are a feature of Apache Pinot open source, but the large volume of primary keys stored in heap memory led to memory issues. StarTree’s Off-Heap Upserts, which uses a scalable, persistent data store for handling large volumes of upserts, resolved this, significantly reducing the memory footprint.
- Improved availability: Razorpay initially had a replication factor of 1. Through StarTree’s consultation, they applied proper indexing strategies and optimized server distribution. This allowed them to increase the replication factor to 3 while simultaneously reducing resource utilization.
- Proprietary indexes we have added : To enhance query performance, we’ve applied StarTree’s proprietary Sparse Index. These indexes are a hybrid between Inverted indexes and Bloom filters, optimized for accelerating queries that filter using equality (or IN) conditions on columns with high cardinality, such as those storing large sets of random values, UUIDs, or public IP addresses.
- Tiered storage: Apache Pinot’s data intensive operations can lead to significant storage costs. By enabling Tiered Storage in StarTree Cloud, older, infrequently accessed data is offloaded to S3 instead of being stored locally. This approach retains almost all indexes on the tiered storage, maintaining strong performance while significantly reducing storage costs.
- Additional optimizations:
- Segment sizes have been optimized to enable faster remote reads of index buffers.
- Heap memory configurations have been adjusted to better align with the dataset characteristics.
As a result of these optimizations, Razorpay reduced their total cost of ownership and improved performance. The reduction in infrastructure costs exceeds 50%.
Additional information on Razorpay’s use case and their utilization of StarTree Cloud can be found here.
Leading ride-sharing company
A leading Southeast Asia ride-sharing company, offering services such as Deliveries, Mobility, Financial Services, and more, leverages Apache Pinot for several key use cases, including market and persona analysis. To enhance their analytics capabilities, they initiated a proof of concept (POC) on StarTree Cloud with a minimal cluster. Based on an in-depth assessment, the StarTree team recommended the following key improvements:
- StarTree Upserts vs. OSS Upserts: The company heavily relied on upserts in Apache Pinot but faced memory issues due to the high volume of primary keys. OSS upserts use heap memory, which proved insufficient for their needs. By switching to StarTree’s off-heap upserts, they achieved significant memory efficiency and improved overall performance. This optimization led to a significant reduction in infrastructure costs.
- Optimizing StarTree Index: Although they had implemented the StarTree index on most of their tables, the performance was not meeting expectations. After consulting with the customer, the StarTree team identified additional optimizations that could be applied to their StarTree index configurations, leading to better performance.
- Anomaly detection: Given the nature of their data, real-time anomaly detection is critical. Using StarTree’s ThirdEye Anomaly Detection tool, they were able to seamlessly configure and deploy an effective real-time anomaly detection system.
As a result of the POC, the company scaled up their cluster to support all their use cases, gradually migrating the majority of their analytical workloads to StarTree Cloud, with plans to onboard the remaining use cases over time.
More case studies
StarTree empowers enterprises across various industries to expand their analytical capabilities while significantly reducing infrastructure costs and improving query performance. Explore more compelling case studies and success stories here.
Get started with StarTree Cloud today
Interested in how your organization can improve queries and reduce costs with StarTree Cloud? Contact us for a demo. We’d love to listen to your needs, understand your use case, and answer any questions you may have on whether StarTree Cloud is the best database for your real-time analytics needs. You can also get started immediately in your own fully-managed serverless environment with StarTree Cloud Free Tier.