Release Version 0.6.0: Feb 2023
Apache Pinot
Multi stage enhancements
The multi stage query engine now supports the HAVING
, ORDER BY
, IN
, and NOT IN
clauses. It also supports left joins, semi joins, and inequality joins. All functions have now been registered with the Calcite catalog reader.
V1 query engine enhancements
Added support for isDistinctFrom and isNotDistinctFrom. More details in the pull request.
Multi volume
Added capability to tiering storage locally by attaching multiple disks (eg: SSD + HDD). More details in the Pinot documentation.
Segment level debug
Added segment level Debug API and UI. More details in the pull request.
Consumer record lag
Added a new API for exposing record based lag during real-time consumption. Currently this is only supported for Kafka data source. More details in the pull request.
Custom time boundary for hybrid tables
Added new APIs for configuring a custom time boundary for hybrid Pinot tables as well as a validation API to ensure all segments have finished uploading during offline push (check against ideal state). More details in the pull request.
Upserts enhancements
Added enableSnapshot flag in upsertConfig to use snapshot for upsert metadata recovery. This will help achieve TTL (Time To Live) support for primary keys. More details in the Pinot documentation and pull request.
Extract metadata from stream event header
Added support for using fields within the message envelope as columns in the Pinot schema (eg: key within a Kafka message envelope). More details in the pull request.
Segment Reload enhancements
Added ability to change compression type during segment reload. More details in this PR. In addition, added capability within the Pinot UI to track reload progress. More details in PR-9521 and PR-9700.
Adaptive Server Selection
Warning: This feature is experimental, under development, and turned off by default. We recommend using this feature for testing purposes only.
When a query is received, we could use one of the implemented Adaptive Selectors (NumInFlightRequests, Latency, Hybrid) to efficiently route queries to the best server instead of using a naive round robin approach. More details can be found in the Pinot documentation.
Frictionless Ingestion
- Automatically infer parquet reader type based on file metadata in case of offline ingestion. More details in this PR.
- Added Spark Job Launcher utility for offline table ingestion within the Pinot admin tool for ease of use. More details in this PR.
- Added continueOnError flag within the Pinot table config. If set to true, any errors from data type or expression transformations are ignored and null / default values are used instead. This is useful when users don’t want the ingestion to stop because of a few bad records. More details in PR-9320 and PR-9376.
MergeRollup task on real-time tables
Added support for merging / rolling up segments of a real-time Pinot table. More details in this PR.
Force commit
Added a new resetConsumption API in the controller to force the current consuming segment in a real-time Pinot table to be committed. More details in this PR.
Seamless stream change
Added the capability to modify stream properties (for eg: start consuming from a different Kafka topic) without disabling the table. More details in this PR.
Logging enhancements
- Added a /loggers API endpoint to change logging level at runtime. More details in this PR.
- Added a new API to allow downloading logs from individual components (broker/server) as well as a new controller API to download any remote log. More details in this PR.
StarTree Extensions for Apache Pinot: Available only in StarTree Cloud
RocksDB backed Upsert BETA
Added the capability of configuring RocksDB backend for managing upsert metadata in a Pinot server. This enables the server to handle a lot more primary keys than before (previously this was done in memory).
Databricks Delta IngestionALPHA
Added support for ingesting data from a Databricks Delta table into an offline Pinot table.
SegmentRefresh for RT tables ALPHA
Added support for performing a segment refresh operations for all completed segments within a real-time Pinot table. This enables users to ensure real-time table segments adhere to the latest Pinot table config.
Debezium connector for MySQL ALPHA
Added support for ingesting MySQL Debezium CDC format messages from a real-time stream
Tiered storage BETA
- Improved server restart time when cloud tiered storage is enabled by persistently caching certain portions of all column indexes needed during restart.
- Query performance improvements using selective columnar fetch based on query pattern and block level reads
File size based task planner in ingestion GA
Added capability to configure minion tasks to ingest in a size based manner in addition to count based. This allows the user to ingest all files from a data source in a single round of tasks based on the total size.
StarTree Cloud – includes BYOC (Bring Your Own Cloud) and SaaS
Disaster Recovery for data plane ALPHA
StarTree Admin is able to recover a given workspace from a region failure by recovering the StarTree cluster state (RTO: 24 hours)
Release decoupling ALPHA
StarTree admins can now release individual components like Pinot or Data Manager without requiring a full release
Authentication service GA
Authentication service for secure access to Startree environments is now GA
Token generation for Try environments GA
Token generation for secure access to Pinot cluster in trial environments is now GA.
Data Manager: Self-Service Ingestion tool
Improved AWS IAM Role based onboarding experience BETA
Users now can check the AWS account id directly on DM instead of asking the StarTree customer support team. More enhancements to come in the next release.
Dimension Table support GA
Users can now create dimension tables from DM.
Enhanced Datetime inference logic GA
More accurate datetime column inference during data modeling in DM
Support enhanced security mechanisms for Kafka SASL authentication GA
Support added in DM for the following SASL mechanisms:
- PLAINTEXT
- SCRAM-SHA-256
- SRAM-SHA-512
Support for GZ file ingestion GA
Gzipped files can be ingested via DM directly..
Other supported formats: Avro, Json, CSV, Parquet, ORC
Enhanced Data ingestion GA
Couple of improvements for robust data ingestion experience:
- Improved dictionary inference logic
- Data size configuration for Minion
Support for schema registry in Kafka connector GA
Users will only see the topics that are registered in the schema registry and the data format is no longer needed to be selected.
Record reader config support GA
Users can now pass the record reader specific configs via DM. (e.g. split delimiter for csv reader)
BigQuery connector GA
Users can now self-serve data ingestion from BigQuery using Data Manager no-code experience with a few clicks.
For more information, see https://startree.ai/docs/startree-extensions/sql-connector.
Dataset ingestion status GA
Users can now monitor the data ingestion status and view ingestion logs after submitting the ingestion jobs. This will help users to debug issues and fix them as needed.
For more information, see https://startree.ai/docs/startree-enterprise-edition/startree-dataset-manager/ingestion-status.
Kinesis connector GA
Users can now self-serve data ingestion from AWS Kinesis using Data Manager no-code experience with few clicks.
For more information, see https://startree.ai/docs/startree-enterprise-edition/startree-dataset-manager/kinesis.
ThirdEye: Anomaly Detection and Root Cause Analysis Tool
Cohort recommender GA
Cohort recommender) will help users identify the cohorts of top contributors (single or a group of dimensions) contributing to a spike in a given metric of a dataset. Using this feature user can now monitor single or multiple time series data for a given time range for the in-scope dataset and metrics.
For more information, see https://startree.ai/docs/startree-enterprise-edition/startree-thirdeye/reference/cohort-recommender.
Guided onboarding and alert creation flow GA
Users of ThirdEye can now use guided ThirdEye onboarding and sample alerts to quickly try, evaluate and get started with ThirdEye in a few clicks.
Integrated flow for cohort recommender and multi-dimension alert creation GA
Users of ThirdEye can now use cohort recommender to identify top contributors (single or group of dimensions) for a given metric and monitor single or multiple time series in two to three clicks. Using these features users need not manually configure alerts which can be time-consuming and prone to error.
Bulk delete support for Alerts, anomalies and related entities GA
Users can now bulk delete alerts, anomalies, and related entities to clean up all experiment data or static data which are no longer in use.
Anomaly filters GA
Now users can apply “Anomaly filters” (such as Threshold-based, Weekend, Holiday, and Cold_Start filters) as part of the low-code alert configuration to fine-tune the alerts and improve accuracy by eliminating noise.
Root-cause analysis (Dimension filter (include/exclude) GA
Users can now perform root-cause analysis for a selected set of dimensions instead of entire list of dimensions.
For more information, see https://startree.ai/docs/startree-enterprise-edition/startree-thirdeye/concepts/alert-configuration#rcaexcludeddimensions.
Launched Dimension Exploration GA
Users can now monitor multiple time series for single or group dimensions for a dataset and metric by configuring a single one-time (low-code/no-code) alert.
For more information, see https://startree.ai/docs/startree-enterprise-edition/startree-thirdeye/concepts/dimension-exploration.
Load default alert templates GA
Now users can load default alert templates from the Alert Templates Configuration screen instead of using API to load default alert templates.
Alert creation (Derived/transformed metrics support) GA
Users can now create “derived or transformed metrics” during alert creation.
For more information, see https://startree.ai/docs/startree-enterprise-edition/startree-thirdeye/how-tos/alert/derived-metric-alert.