The PGM-index: a multicriteria, compressed and learned approach to data indexing
Paolo Ferragina, Giorgio Vinciguerra

TL;DR
The PGM-index is a novel learned data structure that guarantees I/O-optimal query performance, significantly reduces space compared to traditional indexes, and adapts to data and query distributions with multiple variants.
Contribution
We introduce the PGM-index, a purely learned, geometrically inspired index that guarantees optimal I/O performance and outperforms traditional indexes in space efficiency.
Findings
PGM-index reduces space by 63.3% compared to FITing-tree.
PGM-index outperforms B-tree by over four orders of magnitude in space.
Variants of PGM-index adapt to data distribution and optimize space-time trade-offs.
Abstract
The recent introduction of learned indexes has shaken the foundations of the decades-old field of indexing data structures. Combining, or even replacing, classic design elements such as B-tree nodes with machine learning models has proven to give outstanding improvements in the space footprint and time efficiency of data systems. However, these novel approaches are based on heuristics, thus they lack any guarantees both in their time and space requirements. We propose the Piecewise Geometric Model index (shortly, PGM-index), which achieves guaranteed I/O-optimality in query operations, learns an optimal number of linear models, and its peculiar recursive construction makes it a purely learned data structure, rather than a hybrid of traditional and learned indexes (such as RMI and FITing-tree). We show that the PGM-index improves the space of the FITing-tree by 63.3% and of the B-tree by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
