LeaFi: Data Series Indexes on Steroids with Learned Filters

Qitong Wang; Ioana Ileana; Themis Palpanas

arXiv:2502.01836·cs.DB·February 5, 2025

LeaFi: Data Series Indexes on Steroids with Learned Filters

Qitong Wang, Ioana Ileana, Themis Palpanas

PDF

1 Repo

TL;DR

LeaFi introduces machine learning-based learned filters to enhance tree-based data series indexes, significantly improving pruning efficiency and search speed while maintaining high recall across diverse datasets.

Contribution

The paper presents LeaFi, a novel framework that integrates learned filters into tree-based indexes to optimize pruning and search performance in data series similarity search.

Findings

01

Pruning ratio improved up to 20x.

02

Search time reduced up to 32x.

03

Maintains 99% recall across datasets.

Abstract

The ever-growing collections of data series create a pressing need for efficient similarity search, which serves as the backbone for various analytics pipelines. Recent studies have shown that tree-based series indexes excel in many scenarios. However, we observe a significant waste of effort during search, due to suboptimal pruning. To address this issue, we introduce LeaFi, a novel framework that uses machine learning models to boost pruning effectiveness of tree-based data series indexes. These models act as learned filters, which predict tight node-wise distance lower bounds that are used to make pruning decisions, thus, improving pruning effectiveness. We describe the LeaFi-enhanced index building algorithm, which selects leaf nodes and generates training data to insert and train machine learning models, as well as the LeaFi-enhanced search algorithm, which calibrates learned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qtwang/leafi
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.