TL;DR
LeaFi introduces machine learning-based learned filters to enhance tree-based data series indexes, significantly improving pruning efficiency and search speed while maintaining high recall across diverse datasets.
Contribution
The paper presents LeaFi, a novel framework that integrates learned filters into tree-based indexes to optimize pruning and search performance in data series similarity search.
Findings
Pruning ratio improved up to 20x.
Search time reduced up to 32x.
Maintains 99% recall across datasets.
Abstract
The ever-growing collections of data series create a pressing need for efficient similarity search, which serves as the backbone for various analytics pipelines. Recent studies have shown that tree-based series indexes excel in many scenarios. However, we observe a significant waste of effort during search, due to suboptimal pruning. To address this issue, we introduce LeaFi, a novel framework that uses machine learning models to boost pruning effectiveness of tree-based data series indexes. These models act as learned filters, which predict tight node-wise distance lower bounds that are used to make pruning decisions, thus, improving pruning effectiveness. We describe the LeaFi-enhanced index building algorithm, which selects leaf nodes and generates training data to insert and train machine learning models, as well as the LeaFi-enhanced search algorithm, which calibrates learned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
