Superseding traditional indexes by orchestrating learning and geometry
Giorgio Vinciguerra, Paolo Ferragina, Michele Miccinesi

TL;DR
This paper introduces the PGM-index, a learned index that outperforms traditional data structures in time and space by leveraging geometric models and adaptive algorithms, including a distribution-aware variant.
Contribution
It presents the first learned index that surpasses classic indexes in efficiency, introduces the concept of multicriteria data structures, and demonstrates significant performance improvements.
Findings
Outperforms classic indexes in time and space complexity
Introduces a distribution-aware learned index variant
Achieves several orders of magnitude improvements in experiments
Abstract
We design the first learned index that solves the dictionary problem with time and space complexity provably better than classic data structures for hierarchical memories, such as B-trees, and modern learned indexes. We call our solution the Piecewise Geometric Model index (PGM-index) because it turns the indexing of a sequence of keys into the coverage of a sequence of 2D-points via linear models (i.e. segments) suitably learned to trade query time vs space efficiency. This idea comes from some known heuristic results which we strengthen by showing that the minimal number of such segments can be computed via known and optimal streaming algorithms. Our index is then obtained by recursively applying this geometric idea that guarantees a smoothed adaptation to the "geometric complexity" of the input data. Finally, we propose a variant of the index that adapts not only to the distribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Time Series Analysis and Forecasting
