Release Version 0.7.1: November 2023

Significant Apache Pinot updates since last StarTree release

For complete details on Pinot changes, see Releases.

Skip unparseable records in the CSV reader. To enable, set the skipUnParseableLines flag to true (pull request).
Protocol buffer ingestion supports null values with Proto 3 (pull request)
Upgrade Confluent libraries from 5.5.3 to 7.2.6 (pull request)
Faster real-time table ingestion with updates to the segment builder. To enable, edit the table configuration to set realtime.segment.flush.enable_column_major to true (pull request)
Improve alias handling in single-stage engine with multiple fixes to column aliases (pull request)
Enhance handling of new partitions when using StrictReplicaRoutingto prevent “instance unavailable” exceptions (pull request)
Optimize performance in the multi-stage engine:
- For a single join key and group key scenario, operate directly on the key values without wrappers (pull request)
- Operate on column indexes in multi-stage aggregations to prevent extra conversion steps
- Avoid converting unnecessary rows in aggregations (pull request)
Enhance segment assignments for upsert tables with more checks to ensure that the conditions required for upsert functionality to work are not violated (pull request)
Fix handling of literals used in aggregation for v2 engine (pull request)

You must now specify the data type of literals in Pinot queries. Before this change, for example, 2022-02-02 22:22:22.123 was automatically treated as a timestamp data type. Now, following standard SQL behavior, use CAST('2022-02-02 22:22:22.123' AS TIMESTAMP) instead (pull request).
Change the “forbidden” error to “unauthorized” (pull request)
Table configurations that point to a different schema name no longer work (pull request).
You can no longer change the table state using the GET call (pull request).
You can no longer create a schema with NaN as the default value (pull request).
BigDecimal responses are now stored as a string with double quotes instead of a number (pull request).

The following updates are available only in StarTree Cloud.

Improvements to file ingestion task:
- Enhancements to batch ingestion using minion to improve atomic ingestion and backfill operations
- Control size-based segment creation with desiredSegmentSize to improve performance
Automatically tune segment size for segment refresh task without configuring maxNumRecordsPerTask and maxNumRecordsPerSegment. Size-based tuning helps make predictable segment sizes and avoid memory- or size- related exceptions
Validation is stricter for using sync mode in conjunction with other tasks. You can no longer schedule the segment refresh task at the same time as sync mode.
Separate RocksDB log from server logs to improve debugging experience and allow you to set different retention and rollover policies
Improve Kafka logs by changing the following classes to error-level:
- KafkaConsumer
- AppInfoParser
- ConsumerConfig
Enhancements to upsert tables:
- Correctly track primary key count and add corresponding metrics
- Improve stability during deletion
Improve performance and navigation in broker and server Grafana dashboards
Move to Google Trust Services Certificate Authority to improve certification management

Improve data sampling from Kafka topics with large numbers of partitions by preventing “no data” error in preview
Automate Google Cloud Platform (GCP) credentials in Data Manager so you can ingest instead of having to contact StarTree support
Improve error messages to aid troubleshooting

Improve loading time for multi-dimension alerts and dashboard statistics
Simplified alert creation with advanced anomaly detection and tuning options, reducing complexity of data patterns and seasonality