A Lazy Approach for Efficient Index Learning
Guanli Liu, Lars Kulik, Xingjun Ma, Jianzhong Qi

TL;DR
This paper introduces a lazy, model reuse approach for learned indices that reduces training costs and improves update efficiency by pre-training on synthetic datasets and reusing models for real datasets.
Contribution
The paper proposes a novel pre-training and model reuse strategy for learned indices, addressing efficiency and update challenges in practical applications.
Findings
Effective model reuse reduces training time.
Bounded indexing errors ensure reliability.
Experimental results confirm improved efficiency and accuracy.
Abstract
Learned indices using neural networks have been shown to outperform traditional indices such as B-trees in both query time and memory. However, learning the distribution of a large dataset can be expensive, and updating learned indices is difficult, thus hindering their usage in practical applications. In this paper, we address the efficiency and update issues of learned indices through agile model reuse. We pre-train learned indices over a set of synthetic (rather than real) datasets and propose a novel approach to reuse these pre-trained models for a new (real) dataset. The synthetic datasets are created to cover a large range of different distributions. Given a new dataset DT, we select the learned index of a synthetic dataset similar to DT, to index DT. We show a bound over the indexing error when a pre-trained index is selected. We further show how our techniques can handle data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Domain Adaptation and Few-Shot Learning · Hydrological Forecasting Using AI
