ResourcesVideo

What Is Real-Time Analytics?

Real-time analytics refers to a type of data analysis focused on delivering insights to end users and consumers in real time. Where batch analytics and reporting of data once measured turnaround in hours or days, in recent years the pace of business has accelerated to require results in second or even subsecond scales. Companies now require far faster analytical speeds and face far greater challenges in managing large quantities of data.

Every event, from a simple online search to a meal delivery request, holds the potential to shape decisions and outcomes. Real-time data analytics provides the ability to ingest data as soon as events happen and make it available for querying as soon as it’s ingested.

This article introduces the concept of real-time analytics and shares the benefits and common use cases. Additionally, we explore how an analytics database designed for real-time data compares to traditional Online Analytical Processing (OLAP) databases and how to determine when your use case needs real-time data analytics.

What does “real-time” mean?

The value of an event is at its highest when it happens, providing insights into the current state of the world. However, this value decreases over time, especially in scenarios like meal delivery, where the urgency and relevance of an event (such as a user searching for food) declines rapidly.

Real-time analytics time and value relationship

Real-time applications and systems are designed to keep the time delay as low as technically possible between when an event occurs and when it is observed and acted upon, and for any delay to be as predictable as possible. When applied to analytics, what we mean by “real-time” is generally that fresh data results will return to a user-facing application faster than the blink of an eye (<100 ms) to a few seconds at most even for queries against very large (petabyte-scale) data sets.

Real-time data vs. batch data

Today, many organizations use both real-time data and batch data for data analysis. Batch data refers to data that is processed as a group or in a set, typically over a specified time interval. Batch processing is a fundamental concept in traditional data processing systems, data warehousing, and ETL workflows.

Real-time data, as we outlined above, involves the continuous flow of data and immediate availability of said data for analysis. Technologies like streaming platforms, real-time databases, and analytics frameworks are commonly used to handle and process real-time data efficiently.

In summary:

The batch process is intermittent — it operates periodically based on predetermined time intervals or on demand.

The streaming process is continuous — it responds to events as they happen.

What is a real-time analytics database?

These databases are a subset of Online Analytical Processing (OLAP) databases, designed specifically for fast, complex, and massively concurrent queries against datasets measured in terabyte to petabyte scales.

Traditional OLAP databases were designed to handle batch analytics, where data was imported in bulk from transactional systems — for example, hourly, nightly or weekly batch updates. Since data may have been updated many times since then, such data can be considered “stale.”

Real-time analytics system detailing freshness, latency, and concurrency

Real-time data analytics go beyond this by being designed to deal with the low latency and high throughput data ingestion requirements of event streaming data and Change Data Capture (CDC). Real-time OLAP systems may ingest data at the rate of hundreds of thousands or millions of objects (or events) per second — and thus able to provide better data “freshness,” ensuring users have access to the most current information for data analysis.

Comparing real-time OLAP databases and OLTP databases

Databases are sorted into two main categories: Online Transaction Processing (OLTP) and OLAP. Real-time OLAP databases are thus classified as real-time OLAP (RTOLAP). Organizations usually run both OLTP and OLAP databases for their respective purposes. While there are real-time OLTP databases, it is important to differentiate why real-time OLTP databases are not appropriate for real-time data use cases.

OLTP databases are designed to process single-record transactions on a row-by-row basis, and thus are often called row store databases. They aren’t optimized to store or index data for complex analytical queries, which require large ranges or full table scans, as well as various forms of pre-computation, such as aggregations to accelerate complex or frequent queries. They also lack flexible indexing strategies.

Some small-scale or infrequent analytical workloads can be run against production OLTP systems. However, if you run frequent and heavyweight analytical queries on an OLTP-oriented database, the transactions themselves suffer, resulting in excessive retries and timeouts — potentially even leading to data loss. Plus, even attempting to scale OLTP systems may never achieve the requisite performance or efficiency that can be achieved by a built-for-purpose real-time OLAP database.

Thus, most often users who start out attempting to run real-time data analytics against OLTP systems will eventually add an OLAP system, keeping the real-time data on the OLAP system in sync with the OLTP system via CDC and event streaming, and migrate their analytical workloads to a real time analytics database to prevent overloading their OLTP systems.

OLAP vs. real-time OLAP

OLAP databases enable you to process data with historical context and train machine learning models. OLAP use cases typically cover long-running queries that finish in minutes or hours to use in next-day reports or occasionally refreshed dashboards.

Real-time OLAP data stores, on the other hand, intend to serve multi-dimensional data in real time at lower latencies measured in seconds or milliseconds. It also supports significantly more end users compared to traditional OLAP, reflected by high rates of queries per second (QPS) measured in the thousands to hundreds of thousands.

Is user-facing analytics the same as real-time analytics?

User-facing analytics, or customer-facing analytics, are a subset of the domain of real-time data analytics. User-facing analytics are real-time data systems capable of supporting high concurrency and massive data scale. Not all real-time data systems are capable of user-facing analytics. Some real-time data systems may be limited in scale — bound by constraints of memory or processing power, or in terms of usage-based pricing — and therefore incapable of supporting massive concurrency.

Real-time analytics capabilities vs. user-facing analytics capabilities

User-facing analytics give users access to real-time data and analytics from within the product application or platform where they can take action. One of the best examples is LinkedIn’s “Who Viewed My Profile” application, which gives 700 million-plus LinkedIn users access to fresh insights.

Benefits of real-time analytics databases

Real-time data analytics is crucial in situations where timely decision-making is essential, and organizations need to respond rapidly to changing conditions or events.

This type of analytics offers unique benefits, including:

  • Combining real-time and batch data sources: These databases provide a complete, accurate, timely view of the state of your enterprise by combining real-time data and historical data.
  • Support for different data sources: OLAP databases should be able to handle semi-structured data and unstructured data without significant degradation in query performance.
  • Highly scalable: These databases are able to quickly and effectively query terabytes to petabytes of data in seconds.
  • High performance: A real-time OLAP database can ingest real-time event streams of millions of events per second.
  • High concurrency: These databases are designed to support thousands or hundreds of thousands of simultaneous queries without performance bottlenecks.
  • Cost-effective: Optimized and built for purpose, real-time data analytics dramatically reduce infrastructure spend versus other types of systems.
  • Fault tolerant: An OLAP database should provide highly resilient cluster management to mitigate risk and potential points of failure.

Use cases for real-time analytics

Real-time OLAP databases can support a diverse set of use cases that require high throughput, low query latency, and the ability to ingest real-time data from multiple sources as well as historical data.

Common industry use cases include:

  • Financial services: Analyzing real-time market data to mitigate risks and prevent fraud in financial trading, monitoring transactions, and serving user reports and dashboards.
  • E-commerce and retail: Analyzing customer behavior, providing personalized recommendations, managing inventory.
  • Food delivery and rideshare: Monitoring sales and inventory, tracking orders, powering merchant dashboards, detecting unexpected driver delays.
  • Media and social media: Tracking engagement, identifying trends, content performance metrics, serving user dashboards.
  • Observability and cybersecurity: Analyzing network traffic, monitoring user behavior, using machine learning to detect and mitigate cyber threats.
  • Supply chain and logistics: Tracking shipments, managing inventory levels, and adjusting delivery routes in real-time.
  • Telecommunications: Optimizing service quality, identifying and addressing network issues, preventing fraud.
  • Healthcare: Monitoring patient data, tracking services in intensive care units and emergency rooms.
  • Internet of Things (IoT) devices: Processing and analyzing data from IoT devices and sensors, location-based services, machine health, geospatial analysis.

When do I need a real-time analytics database?

Many organizations begin with just a single database, such as an OLTP database capable of real-time data processing of requests — purchases, status updates, user preferences and the like. Not only does it run all of the organization’s transactions, but it also serves as a reporting system.

However, as an organization grows, running both transactions and analytics may become so taxing on the system that even when they scale their database, it begins to see performance degradation. At such a time, the organization tends to split off their analytics to a data warehouse. However, these only serve batch analytics and non-real-time reporting.

Moving from an OLTP database to a data warehouse

The next step is that organizations may add on some sort of event streaming and stream processing capabilities. This allows real-time updates and simple data transformations and aggregations, but it is usually an intermediary step.

Adding event streaming capabilities to allow for real-time insights

The final transformation is when organizations employ real-time data analytics side-by-side with their real-time transactional database systems, as the diagram below shows.

employing real-time analytics side-by-side with their real-time transactional database systems

Real-time OLAP systems can also combine part of the data from the data warehouse. For instance, to compare live, real-time data versus historical data and expected historical trends.

If you’re still unsure whether you need real-time OLAP, consider whether the following factors and metrics apply to your use case:

  • Low latency: Needing data results in seconds to milliseconds
  • Increasing concurrency: Requiring support for many concurrent users and high QPS
  • Increasing demand for data freshness: Needing a way to get data to business users and decision makers in a timely manner
  • Combining real-time and historical data sources: Connecting to both real-time streaming systems and data warehouses without compromising query latency or concurrency
  • Oversized infrastructure: Finding your infrastructure (and bill) ballooning in size and becoming increasingly complex to handle

Learn more in our blog on 5 signs your use case needs real-time OLAP.

Several databases and data processing technologies are popular for real-time data analytics due to their ability to handle high-velocity, real-time data streams and provide low-latency access to insights.

Three popular open-source real-time OLAP databases include Apache Pinot, Apache Druid, and ClickHouse. These are all column-oriented databases, which are good for efficiently scanning only selected columns of data, minimizing storage footprint, and scanning through highly redundant data entries.

For a comprehensive overview of these three databases, check out our comparison of Apache Pinot, Apache Druid, and ClickHouse.

Real-time analytics with Apache Pinot

Apache Pinot is one of the most popular open-source software (OSS) real-time OLAP databases available. It provides massive concurrency for user-facing analytics with a wide range of indexing for fast-performing queries. Apache Pinot is best known for the star-tree index, which improves performance and provides greater flexibility and efficiency over traditional methods like materialized views and OLAP cubes.

Learn more about Apache Pinot to find out why Pinot has a competitive advantage over other real-time OLAP databases.

Apache Pinot at scale

Apache Pinot’s flexible, powerful architecture is uniquely suited to real-time data workloads. Capable of powering upwards of 100k QPS with a p99th latency of 100ms, Pinot delivers high query accuracy and high performance regardless of query complexity. These and other native integrations and capabilities make Pinot ideal for a variety of real-time data use cases.

Leading organizations use Apache Pinot to power their use cases for real-time data analytics, including LinkedIn, Uber, Stripe, Walmart, WePay, and more. See how some of these companies are using Stripe:

Apache Pinot at LinkedIn: LinkedIn uses Apache Pinot to power many of their user experiences as well as their internal reporting platform. Any number that you see on LinkedIn, whether you go to “Who Viewed My Profile?” or your LinkedIn feed, is querying Pinot behind the scenes.

Apache Pinot at Uber: Uber relies on Apache Pinot to power many different systems. If you open an UberEats app and see orders near you, you are making a query to Pinot.

Apache Pinot at Stripe: Stripe utilizes Pinot to power user-facing interactions and internal use cases, and Pinot also supports various platform offerings within Stripe. Almost every transaction in Stripe is stored in Pinot, and they’re able to achieve a sub-second latency at this scale.

Apache Pinot across LinkedIn, Stripe, and Uber

Get fully-managed Apache Pinot with StarTree Cloud

Explore the benefits of StarTree Cloud, an advanced real-time analytics platform powered by Apache Pinot. This fully-managed Database-as-a-Service (DBaaS), provides streamlined data management as well as unique features such as simplified real-time data ingestion and modeling, anomaly detection, and root cause analysis through StarTree Data Manager and ThirdEye.

Learn more about StarTree Cloud

Ready to deploy real-time analytics?

Start for free or book a demo with our team.