Learned Indexes with Distribution Smoothing via Virtual Points

Kasun Amarasinghe; Farhana Choudhury; Jianzhong Qi; James Bailey

arXiv:2408.06134·cs.DB·December 17, 2024

Learned Indexes with Distribution Smoothing via Virtual Points

Kasun Amarasinghe, Farhana Choudhury, Jianzhong Qi, James Bailey

PDF

Open Access

TL;DR

This paper introduces a novel distribution smoothing technique using virtual points to enhance learned indexes, significantly improving query performance especially for challenging key regions with minimal additional storage.

Contribution

The paper proposes a distribution smoothing method with virtual points and an algorithm CSV to improve learned index accuracy and efficiency without structural changes.

Findings

01

Significant query performance improvements observed.

02

Enhanced accuracy for difficult key regions.

03

Low additional storage overhead.

Abstract

Recent research on learned indexes has created a new perspective for indexes as models that map keys to their respective storage locations. These learned indexes are created to approximate the cumulative distribution function of the key set, where using only a single model may have limited accuracy. To overcome this limitation, a typical method is to use multiple models, arranged in a hierarchical manner, where the query performance depends on two aspects: (i) traversal time to find the correct model and (ii) search time to find the key in the selected model. Such a method may cause some key space regions that are difficult to model to be placed at deeper levels in the hierarchy. To address this issue, we propose an alternative method that modifies the key space as opposed to any structural or model modifications. This is achieved through making the key set more learnable (i.e.,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Advanced Vision and Imaging · Remote Sensing and LiDAR Applications