Resources
Blog

Clickstream and Funnel Analysis with Apache Pinot and StarTree


released on
July 16, 2024

In the digital age, understanding user behavior on websites and applications has become crucial for businesses to optimize their digital strategies. Clickstream analysis, a method of analyzing the data produced by users as they navigate through a website or application, provides valuable insights into user behavior. Funnel analysis, a subset of clickstream analysis, focuses on tracking the steps users take to complete a specific goal, such as making a purchase or signing up for a service.

Apache Pinot, a real-time distributed OLAP datastore, is designed to scale horizontally and is well-suited for clickstream and funnel analysis due to its ability to handle large volumes of data with low latency.

In this blog, we will explore how Apache Pinot can be used for clickstream and funnel analysis, and we’ll demonstrate the process with an example. This is part one of a two-part blog series; in part two we’ll cover how to perform clickstream and funnel analysis using Apache Pinot and StarTree.

The benefits of clickstream and funnel analysis

Today, clickstream analysis encompasses not only website interactions but also a wide range of digital touchpoints, including mobile apps, social media platforms, and online marketplaces. It is used to track customer journeys across multiple channels, understand product usage patterns, measure marketing campaign effectiveness, and optimize sales funnels. As businesses continue to invest in data analytics, clickstream analysis is expected to further evolve, providing deeper insights into customer behavior and driving strategic decision-making across various business functions.

Let’s consider an example of an e-commerce website to demonstrate clickstream analysis in action. Suppose we have the following events in our clickstream data:

  1. Add to Cart: Represents a user adding a product to the cart.
  2. Checkout: Represents a user proceeding to the checkout page.
  3. Purchase: Represents a user completing the purchase.

If you want to understand user behavior across each step of your checkout process, Apache Pinot makes it incredibly easy to perform funnel analysis.

Analytical questions on clickstream funnel data

Using the clickstream and funnel analysis example, businesses can answer the following analytical questions:

  1. Conversion Rate Analysis: What is the overall conversion rate from viewing a product to making a purchase? How does this conversion rate vary across different product categories or user segments?
  2. Drop-off Analysis: At which step of the funnel do users most commonly drop off? Are there specific products or pages that experience higher drop-off rates?
  3. Behavioral Analysis: How do user behaviors differ between those who complete the purchase and those who abandon their carts? Are there patterns in the types of products viewed or the time spent on each page?
  4. Session Analysis: How many users complete the entire funnel in a single session, and how does this number change over time or based on user demographics?
  5. Funnel Optimization: What are the key areas of the funnel that could be optimized to improve the overall conversion rate? For example, are there bottlenecks in the checkout process that could be streamlined?
  6. Campaign Performance: How do marketing campaigns or promotions impact the conversion rate at each step of the funnel? Are certain campaigns more effective at driving conversions than others?
  7. User Journey Analysis: What are the most common paths users take through the website or app before completing a purchase? Are there opportunities to personalize the user experience based on these paths?
  8. Seasonal Trends: How do user behaviors and conversion rates change during peak shopping seasons or holidays? Are there products or categories that experience higher demand during these times?

Competitive landscape for clickstream analytics

Amplitude, Mixpanel, and similar tools have long been the go-to solutions for clickstream funnel analytics, offering comprehensive, all-in-one platforms. These tools provide extensive features for tracking user interactions, analyzing behavior, cohort analysis, and generating insights to inform product decisions. However, their monolithic nature often leads to vendor and data lock-in, limiting flexibility and adaptability.

The challenge: Monolithic nature and lock-in

While these tools are powerful, their monolithic nature poses several challenges:

  • Vendor Lock-In: Relying on a single provider can be risky and costly. Transitioning away from these tools can be complex and expensive.
  • Data Lock-In: Data stored in proprietary formats can make it difficult to migrate to other platforms or integrate with other tools.
  • Lack of Flexibility: Customization options are often limited, restricting the ability to tailor the solution to specific needs.
  • Cost: Comprehensive platforms can be expensive, especially for smaller organizations or those with specific use cases.

The emerging trend: Disaggregated analytics stack

 

The trend is now shifting towards a more disaggregated approach, offering greater flexibility and control:

  • Best-of-Breed Components: Organizations can choose the tools for each specific function (e.g., data ingestion, storage, and visualization).
  • Interoperability: Open standards and APIs enable seamless component integration, reducing vendor and data lock-in.
  • Cost-Effective: By selecting only the necessary components, organizations can optimize costs.
  • Scalability and Customization: Disaggregated stacks allow for more tailored solutions, scaling and evolving as needs change.

StarTree: Enabling the disaggregated analytics stack

 

StarTree Cloud, powered by Apache Pinot, is  a robust, fully-managed real-time analytics platform designed for high performance and scalability. StarTree offers a compelling solution for those looking to build a disaggregated analytics stack:

  • Data Ingestion: Simplified ingestion from various sources, ensuring optimal data flow into the Pinot table.
  • Real-Time Analytics: Leveraging Apache Pinot for low-latency, real-time analytics, is essential for timely insights and decision-making.
  • Scalability: Designed to handle massive data volumes, ensuring high performance at scale.
  • Flexibility: Allows integration with other best-of-breed tools, like Tableau, Grafana, Superset, and other adaptable analytics stack.
  • Open Source Foundation: Built on open-source technologies, promoting transparency and community-driven innovation.

Summary

The move towards a disaggregated analytics stack is transforming the way organizations approach product analytics. By breaking free from the constraints of monolithic solutions like Amplitude and Mixpanel, businesses can achieve greater flexibility, reduce costs, and avoid vendor lock-in. StarTree’s approach enables this transformation, offering a scalable, high-performance, and adaptable solution for modern analytics needs.

StarTree provides a fully managed offering and much more on top of Apache Pinot — check out the advantages of StarTree Cloud over open source Pinot. Leveraging StarTree’s robust set of indexes and the Data Sketches library, you can take advantage of Apache Pinot’s fast and scalable architecture. This enables you to analyze billions of rows of data for thousands of concurrent users with query latency in single-digit milliseconds.

Stay tuned for part two of this blog series to learn how to leverage Apache Pinot and StarTree Cloud for clickstream analysis.

In the meantime, if you’re interested in trying a fully-managed version of Apache Pinot, check out StarTree Cloud Free Tier. Our free serverless offering is perfect for development and prototyping — and lets you start running queries in minutes.

TRY FREE TIER

Ready to deploy real-time analytics?

Start for free or book a demo with our team.