How Real-Time Gen AI Pipelines Deliver Business-Wide Value
Session title: Building Real-Time Gen AI Pipelines with Apache Pinot and AWS
Real-time vector-powered artificial intelligence is a necessity mandated by the speed with which data traverses business channels, customer conversations, and supply chains.
Those pipelines are used not just for storing data but acting on it as it arrives, said Nolan Chen, AWS Partner Solutions Architect. They’re the glue that enables informed and immediate decisions across a wide spectrum of applications, from fraud detection to personalized product recommendations.
Vector embeddings and the Pinot advantage
Vectors are the magic ingredient in real-time generative AI. A vector is a list of numbers in multidimensional space. A vector that represents a piece of content, such as a product description, image, or comment, is called an embedding. Embeddings stored in vector databases enable similarity search.
That’s a technique used to find items in a dataset that are most similar to a query item based on some measure of distance or similarity. The technique is commonly used in image retrieval, recommendation systems, and use cases where multiple keywords may apply to a concept.
Similarity search is far more flexible and powerful than keyword search, but also slower and more resource-intensive. “If you try to compare every new query vector to a billion stored vectors, it’s just not scalable,” said Raj Ramasubbu, an AWS Solutions Architect.
Most vector databases update in batches, with new data ingested once every hour or less. That occasional frequency makes them inappropriate for use in real-time generative AI pipelines. Apache Pinot added support for real-time vector ingestion in 2023, “making similarity search a real-time operation,” Chen said.
Near neighbors
The heart of similarity search is K-Nearest Neighbors, a supervised machine learning algorithm that works by finding the closest data points, or neighbors, to a given input and making predictions based on the majority class or average value of those neighbors.
“If you try to use exact KNN to calculate the distance between your query vector and a billion vectors, it will incur a lot of latency,” Ramasubbu said. To compensate for that, Pinot uses approximate nearest neighbor algorithms. Those are graph-based algorithms based on Hierarchical Navigable Small World, which focuses on finding “close enough” matches rather than the absolute closest and is far less computationally expensive than KNN. Pinot supports four distance metrics to give developers a choice based on different business contexts. “It’s a trade-off between accuracy and speed,” he said. “For real-time use cases, speed wins.”
The AWS engineers showed an example of a streaming data pipeline that supports generative AI. Data is ingested via Amazon Kinesis or Managed Streaming for Apache Kafka, a managed service that simplifies building and running real-time applications that use Kafka.
Ingested data is broken into chunks, then embedded using generative models like Amazon Titan. The embeddings are stored in Apache Pinot for real-time vector search. User queries are also embedded and matched against Pinot’s vector index. The most relevant content is packaged into a prompt and fed into a Gen AI model like Anthropic’s Claude Sonnet 3.7 via Amazon Bedrock.
Real-time customer sentiment
AWS Streaming Solutions Architect Francisco Morillo showed how this approach can be used to pull real-time data from Reddit to detect customer sentiment and enable businesses to respond rapidly.
Comments from the AWS subreddit are captured and funneled through Kafka into the Apache Flink distributed stream processing framework. Irrelevant or duplicate data are filtered out and the remaining data embedded using Amazon Titan and Pinot’s real-time vector store.
Morillo demonstrated how a query made before real-time data ingestion was in use would return no results but afterward output detailed, context-aware summaries sourced directly from Reddit.
The information has value across the business. “You capture sentiment, sarcasm, emojis, and such and use it to inform everything from customer support to product development,” Morillo said.
Spectrum of uses
The speakers outlined four compelling use cases for vector embeddings in Pinot.
- Real-time semantic search lets customers find exactly what they’re looking for in product catalogs or documentation using vector similarity.
- Personalized recommendation engines deliver tailored content based on real-time interaction data, not just historical clicks.
- Customer support chatbots empower agents with live data to answer questions accurately and fast.
- Dynamic pricing allows sellers to adjust product prices in real time based on market trends, sentiment, demand, and other factors.
Real-time conversations on platforms such as X, Reddit, and Instagram have become powerful drivers of opinion and sales. Businesses that tap into them can move quickly to deliver new products, head off negative sentiment, and double down on positive feedback. “It’s not just about what you know,” Chen said. “It’s about how fast you can know it—and act on it.”
Moving Forward
Although Apache Pinot pre-dates the rise of LLMs, its architecture—built for data freshness, speed, scale, and concurrency—is a natural fit for modern AI workloads. With these new AI capabilities, we’re entering a new AI-Native Era, and Pinot is ready to power it. And StarTree recently announced support for Model Context Protocol (MCP), with native vector embedding support coming in Fall 2025.