LSI: A Learned Secondary Index Structure
Andreas Kipf, Dominik Horn, Pascal Pfeil, Ryan Marcus, Tim Kraska

TL;DR
This paper introduces LSI, a learned secondary index structure that efficiently indexes unsorted data using a permutation vector and fingerprinting, achieving comparable performance to traditional indexes with significantly less space.
Contribution
The paper presents the first learned index designed specifically for secondary indexing on unsorted data, combining permutation and fingerprint vectors for efficiency.
Findings
LSI achieves comparable lookup performance to state-of-the-art secondary indexes.
LSI is up to 6x more space efficient than traditional secondary indexes.
LSI effectively handles unsorted data with learned indexing techniques.
Abstract
Learned index structures have been shown to achieve favorable lookup performance and space consumption compared to their traditional counterparts such as B-trees. However, most learned index studies have focused on the primary indexing setting, where the base data is sorted. In this work, we investigate whether learned indexes sustain their advantage in the secondary indexing setting. We introduce Learned Secondary Index (LSI), a first attempt to use learned indexes for indexing unsorted data. LSI works by building a learned index over a permutation vector, which allows binary search to performed on the unsorted base data using random access. We additionally augment LSI with a fingerprint vector to accelerate equality lookups. We show that LSI achieves comparable lookup performance to state-of-the-art secondary indexes while being up to 6x more space efficient.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Image and Video Retrieval Techniques · Data Management and Algorithms
MethodsBalanced Selection
