Resources
Blog

Announcing Apache Pinot 0.7.1


1686775104-xiang.jpg
Xiang Fu
released on
June 22, 2021

We are excited to announce that Apache Pinot 0.7.1 was released a few months back in April 2021. Apache Pinot is a real-time distributed datastore designed to answer OLAP queries with low latency for those of you who are new readers of this blog. This release introduced several fantastic new features, including JSON index, Lookup-based Join support, GeoSpatial support, TLS support for Pinot connections, and various performance optimizations and improvements. It also adds several new APIs to better manage the segments and upload data to an offline table. It also contains many critical bug fixes.

JSON Index

A JSON string can represent an array, map, or nested field without forcing a fixed schema. It is very flexible, but it comes at a cost: filtering on a JSON string column is very expensive.

Without an index, we need to scan and reconstruct the JSON object from the JSON string for every record to look up a key and filter records based on the value. Then we need to look up the key and compare the value. Pinot’s new JSON index feature is designed to accelerate the filtering on JSON string columns without scanning and reconstructing all the JSON objects.

Example Query:

Let’s consider the following JSON structured document ingested into a Pinot table column.

Sample JSON structured document ingested into a Pinot table column

By indexing the column, we can now search the JSON document’s structure as quickly as we would by indexing any other Pinot table’s field.

Sample search of a JSON document's structure

Link to the documentation for JSON index with the text of this query can be found here: JSON Index

Lookup-based Join Support

Lookup-based join support was added in 0.7.1 via a new UDF SQL function named LOOKUP. This feature is relatively straightforward to get started with. Before this function, you would have been unable to join tables across Pinot in a single SQL query by default without using something like Presto. We’ve now added support for you to join exclusively from a dimension table. Table types other than dimension are not currently supported. You can find more details about dimension tables in our documentation.

Example Query:

Sample SQL query

The SQL query above joins the dimension table dimBaseballTeams into the regular table baseballStats on the teamID key. The LOOKUP(…) function then returns the value of the column teamName.

Joins in Apache Pinot are different than the kind of joins you might do in a relational database using SQL. The UDF function here provides a long-awaited feature that is performant and convenient for scalable OLAP joins. This function includes the advantages of Pinot’s advanced indexing capabilities that make real-time analytical SQL queries so fast.

To see the function in action, you can also fire JoinQuickstart and test it as follows:

Apache Pinot Query Console view

To run the JoinQuickstart, you can run the following command in Docker to fire up the example.

$ docker run -p 9000:9000 apachepinot/pinot:latest QuickStart -type join

Geospatial Support

To get started with this feature in 0.7.1, you will need to use a transform function in your schema definition configuration for a table.

The first thing you will need to add to your schema definition file to enable geolocation-based queries is your latitude and longitude fields. These fields will be imported from your data source, either from an offline data source or streaming.

A snippet of two imported fields representing latitude and longitude

A snippet of two imported fields representing your latitude and longitude with matching names from your original data source

In your list of fields, which are either imported by their unique name or generated during ingestion using a transform function, you’ll need to list both latitude and longitude fields, as shown above (lon, lat). There is nothing too special going on here, but you’ll need to generate a new field to execute real-time geospatial queries on these fields. You’ll need to generate a new field, which I’ve named location_st_point in the snippet below.

Just to be clear, both of these snippets are from the same configuration block in your schema definition file.

A snippet of a generated field in a schema definition file that produces a byte field containing a queryable geolocation

Now that we’ve added the necessary bits to the schema configuration file, we can now move on to updating the table configuration that references the above schema. The changes here are simple and can be seen below. There may be some changes in future versions, so it’s always good to head over to the most recent version of the Apache Pinot documentation.

Sample table configuration updates

To find the text for copy and paste into your configurations visit our documentation on this feature

The final step to enable indexing on geospatial fields is to modify your table configuration with the settings shown above.

That’s it! After you’ve created your schema and table in Pinot using the above configurations, you’ll be able to start ingesting the now indexed geospatial data and begin executing queries in real-time.

Check out the full feature blog post for geospatial support in Pinot 0.7.1, which continues with an exploration of the H3 indexing system and its origins at Uber.

TLS Support

Support for TLS-secure connections was also added in 0.7.1. TLS can be configured using the following new (or refactored) properties. Upgrades to a TLS-enabled cluster can be performed safely and without downtime. To achieve a live-upgrade, go through the following steps:

  • First, configure alternate ingress ports for https/netty-tls on brokers, controllers, and servers. Restart the components with a rolling strategy to avoid cluster downtime.
  • Second, verify manually that HTTPS access to controllers and brokers is live. Then, configure all components to prefer TLS-enabled connections (while still allowing unsecured access). Restart the individual components.
  • Third, disable insecure connections via configuration. You may also have to set controller.vip.protocol and controller.vip.port and update the configuration files of any ingestion jobs. Restart components a final time and verify that insecure ingress via HTTP is not available anymore. You can find the complete release notes and pull request for this feature here.

Helpful resources

Special thanks

We would like to take a moment to thank the Pinot community for supporting our Product. We keep a steady amount of commits for the past year, and we’ve seen that more and more excellent features are implemented to this project. At this moment, we would like to thank everyone who made contributions to this release.

Number of commits to Pinot Github since 5/31/20 (Source)

Ready to deploy real-time analytics?

Start for free or book a demo with our team.