High-Performance Full Text Search Directly on Iceberg via Lucene-Integrated Indexing

A technical discussion on how Apache Pinot enables high-performance full-text search on Apache Iceberg by integrating Lucene indexes with Iceberg/Parquet data, reducing scan-heavy queries without heavy re-ingestion or data duplication.

Date

June 17. 2026

Time

1:00 PM EDT

While Data Lakehouses like Apache Iceberg provide massive, cost-effective scalability, they are fundamentally designed as scan-heavy engines. They lack the sub-second, “needle-in-a-haystack” full-text search and selective lookup capabilities provided by inverted indices found in traditional search engines. This session explores how Apache Pinot fills this gap by integrating Apache Lucene segments directly into its distributed serving layer while maintaining the source of truth in Iceberg’s Parquet format.

We will conduct a technical deep-dive into:

Segment-to-Parquet Virtualization: Pinot’s segment abstraction onto remote Iceberg/Parquet files without data duplication or heavy re-ingestion.
Hybrid Index Pinning: The mechanics of pinning Lucene Inverted and Text Indexes to local NVMe storage on Pinot servers while leaving the raw data blobs on S3.
Lucene I/O Orchestration: How Pinot optimizes query plans to minimize S3 “Time to First Byte” by leveraging metadata-heavy index structures.

High-Performance Full Text Search Directly on Iceberg via Lucene-Integrated Indexing

Save your spot

You might also be interested in...

StarTree Cloud Fabric — Bringing Order to Multi-Cloud Chaos

Apache Pinot in 2026

BYOC Beyond the Checkbox: How StarTree Fits Enterprise Security Models