A Lazy Approach for Efficient Index Learning

Guanli Liu; Lars Kulik; Xingjun Ma; Jianzhong Qi

arXiv:2102.08081·cs.DB·February 19, 2021·1 cites

A Lazy Approach for Efficient Index Learning

Guanli Liu, Lars Kulik, Xingjun Ma, Jianzhong Qi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a lazy, model reuse approach for learned indices that reduces training costs and improves update efficiency by pre-training on synthetic datasets and reusing models for real datasets.

Contribution

The paper proposes a novel pre-training and model reuse strategy for learned indices, addressing efficiency and update challenges in practical applications.

Findings

01

Effective model reuse reduces training time.

02

Bounded indexing errors ensure reliability.

03

Experimental results confirm improved efficiency and accuracy.

Abstract

Learned indices using neural networks have been shown to outperform traditional indices such as B-trees in both query time and memory. However, learning the distribution of a large dataset can be expensive, and updating learned indices is difficult, thus hindering their usage in practical applications. In this paper, we address the efficiency and update issues of learned indices through agile model reuse. We pre-train learned indices over a set of synthetic (rather than real) datasets and propose a novel approach to reuse these pre-trained models for a new (real) dataset. The synthetic datasets are created to cover a large range of different distributions. Given a new dataset DT, we select the learned index of a synthetic dataset similar to DT, to index DT. We show a bound over the indexing error when a pre-trained index is selected. We further show how our techniques can handle data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Liuguanli/Liuguanli.github.io
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Domain Adaptation and Few-Shot Learning · Hydrological Forecasting Using AI