Watch videos on the past, present, and future of Real-Time Analytics. All in one place.
Learn about fundamental concepts and terms
Real-time analytics is a computer science discipline wherein massive amounts of data generated in relatively short time needs to be ingested, stored, and indexed, followed by additional processes that can search, filter, aggregate, and process that stored data against specific queries to produce results.
User-facing analytics, or customer-facing analytics, are a subset of the domain of real-time analytics that provide massively concurrent query support.
Everything around Anomaly Detection with ThirdEye
Join Tim Berglund as he breaks down the intricacies of anomaly detection, emphasizing the pitfalls of traditional methods.
Learn practical examples for implementing automated monitoring and anomaly using StarTree ThirdEye.
StarTree's ThirdEye, an automated anomaly detection system that operates seamlessly with Apache Pinot, transforms your e-commerce operations by identifying and flagging potential fraudulent activities in real-time.
Process petabytes of data in a cost-efficient manner with Tiered Storage
Sovrn is leveraging Tiered Storage to manage data storage, which allows them to save money on cloud storage and keep their customers' data for when they need it.
Tim and Neha explain the need for a system that combines the speed of tightly-coupled systems and the cost-efficiency of decoupled systems.
See how the concept of Tiered Storage in StarTree Cloud compares to the wizarding world of Harry Potter in this lightboard video.
Get your data into Apache Pinot with the no-code, self-service tool
Introducing StarTree Data Manager: a no-code, self-service tool that helps users of all caliber to quickly get started with Pinot.
StarTree’s Data Manager is a no-code, self-service tool that helps users of all calibers quickly get started with Pinot.
Learn what's new for THE real-time database
Announcing Apache Pinot 1.0! 1.0 has introduced new features to support query-time native JOINs, upsert capabilities, NULL value support in queries, and more.
Explore the pivotal features driving this milestone release. From amplified real-time analytics to robust querying capabilities, we'll navigate the core highlights of Apache Pinot 1.0.
Get to hear from Linkedin, Uber, StarTree and more, what they have in store this year for Pinot. Explore what other community members are working on and Hear what the community wants to see in Pinot.
Industry leaders that are using Apache Pinot
Will Gan (Software Engineer, DoorDash) focuses on two use cases at DoorDash: Mx Portal Ads Campaign Reporting and Risk Platform Dashboarding.
Why Beaconstac moved from ElasticSearch to Apache Pinot for their Analytics Needs
StarTree CEO, Kishore Gopalakrishna, discusses Apache Pinot and performance comparisons with Apache Druid at Crunch Data Conference 2018 in Budapest.
Hear from StarTree customers and their real-time analytics use case
Hear how Sovrn, a leader in the AdTech industry, partnered with StarTree to help them bring real-time analytics to their customers.
Leon Graveland highlights Just Eat Takeaway.com's data-driven evolution amidst pandemic-induced challenges.
A podcast dedicated to bringing analytics from the dashboard to the user interface
Dive into the world of advanced SQL querying with Elon Azoulay, a software engineer at Starburst.
Discover how this advanced feature optimizes OLAP databases, balancing storage and high-speed query performance.
Discover how to calculate the perfect cluster size for your real-time analytics requirements and explore essential technical KPIs like read throughput, write throughput, and data size.
Delve into Apache Pinot's advanced features including its pluggable architecture, upserts, and Kafka integration.
Tim and guest Neha Pawar explore Apache Pinot’s unique capabilities in real-time analytics. Neha unpacks Pinot's efficiency, low latency, and high throughput, revealing its prowess in offering real-time insights to end users.
Johan Adami, a seasoned software engineer from Stripe, shares his experience building out Pinot as an internal service for enhanced real-time analytics.
Join Tim Berglund and Guru Sattanathan as they navigate through the intriguing phases of data silos, application integrations, and the inevitable rise of real-time analytics.
From the early days of KSQL to the cutting-edge work with DeltaStreams, they dive deep into the evolution and impact of real-time analytics, streaming SQL, and cloud-native data solutions.
Delve into the intriguing intersection of data mesh and event streaming with Hubert Dulay, a developer advocate at StarTree and the author of "Streaming Data Mesh."
Discover the secrets behind Wix's cutting-edge real-time analytics.
Dive into Lakshmi's journey from Kafka to Flink and finally to Pinot, understanding the growth and development of real-time analytics in the payment sector.
Short tutorials on Apache Pinot and StarTree Cloud
Mark Needham is back with another StarTree Recipe, this time about using JSON indexes in Apache Pinot. He first demonstrates how to ingest semi-structured data Apache Kafka using the kcat command line tool, before showing how to configure Apache Pinot to ingest the data from Kafka. He further illustrates querying the Pinot table, using the JSON match function and details handling of arrays, nested data, and exclusion of fields. The tutorial underscores the improved efficiency and flexibility of using JSON indexes introduced in Pinot 0.12.
In this StarTree Recipe, Mark Needham explains how to merge segments in real-time tables in Apache Pinot, which we want to do as small segments can lead to higher query latency. He demonstrates how to use the kcat tool to setup the Pinot schema, followed by the table config, which includes the MergeRollupTask. After checking data ingestion through the Pinot UI, Mark shows how to manually run the MergeRollupTask, and what it looks like when the segments are merged successfully. This process, when done properly, can improve query performance significantly.
Mark is back with another recipe where he explains the process of rolling up segments in real-time tables in Apache Pinot. This process aims to reduce the amount of space that data occupies by aggregating it up to the nearest minute, hour, or even day. He uses an example involving product purchases streamed in Kafka, which are aggregated according to parameters such as minimum stock, sum of quantity, maximum price, and sum of sales amount. Needham also demonstrates how these rollups are configured in Apache Pinot, and visualized via the Pinot UI, while explaining the concept of task types, ingestion configuration, segment names, and the influence of time intervals on aggregation.
Join Mark Needham from Startree as we delve into filtering data streams during ingestion in Apache Pinot! With help from a flight-related dataset, we'll learn how to filter events on their way into Apache Pinot without needing to use a stream processor. Whether you're a beginner or experienced with Apache Pinot, this tutorial is packed with hands-on insights to elevate your streaming data game.
Dive deep into the world of full upserts in Apache Pinot with Mark Needham. This tutorial unveils the process of modifying and updating records, emphasizing real-time data streaming. Utilizing a NASDAQ stock prices simulator and Apache Kafka, Mark illustrates the distinction between full and partial upserts, the role of Redpanda Keeper in topic management, and the importance of consistency in partition keys and Pinot configurations.
Mark Needham explains multi volume support in Apache Pinot, with help of an example showing how to store hot and cold data on
In this video we'll do a deep dive into Apache Pinot's Geospatial index, learning when it gets used, as well as what happens when the query engines are querying a geoindexe
Join Mark Needham as he unravels the intricacies of partial upserts in Apache Pinot. In this comprehensive guide, Mark demonstrates the process using a simulated data generator mimicking motor traffic at imaginary junctions. You’ll be walked through the creation of a Kafka topic with Redpanda Keeper, and witness real-time data streaming using JQ and K cat. Discover the significance of the junction ID as a key and delve deep into the configuration and optimization of Pinot schema for efficient data handling. Watch as he navigates through table configs, ensuring optimal settings for partial upsert functionality. Witness data transformation and understand the dynamics of min/max speeds and vehicle counts. Lastly, explore the Pinot UI and query execution, offering insights into real-time data evolution.
This video is all about the segment threshold in Apache Pinot. Mark Needham explains what it is, why we should care, and how to go about configuring it.
In this video we explore the types of geospatial objects that can be stored in Apache Pinot. We also learn about the geospatial index and use it to write a super fast query to find the points within 50km of the centre of San Francisco.
Dive deep into data serialization and ingestion with Mark Needham in this informative session on ingesting Avro encoded data into Apache Pinot. Learn to generate data in Avro format using the Confluent Kafka library, create topics with Redpanda Keeper, and navigate real-time data streaming. This walkthrough covers transforming Avro to Pinot schema and explores telemetry data and streaming configurations. Perfect for those looking to master real-time data handling and processing.
You can find our entire video library over on YouTube