Lower Bounds for the Algorithmic Complexity of Learned Indexes
Luis Alberto Croquevielle, Roman Sokolovskii, Thomas Heinis

TL;DR
This paper establishes theoretical lower bounds on the query time of learned index structures by analyzing their approximation capabilities and space overhead, revealing fundamental limitations of current methods.
Contribution
It introduces a general framework for deriving lower bounds on learned indexes, connecting approximation theory with database query efficiency.
Findings
Lower bounds depend on model class and distribution assumptions.
Piecewise linear and constant models have specific derived bounds.
The framework highlights inherent space-time tradeoffs in learned indexes.
Abstract
Learned index structures aim to accelerate queries by training machine learning models to approximate the rank function associated with a database attribute. While effective in practice, their theoretical limitations are not fully understood. We present a general framework for proving lower bounds on query time for learned indexes, expressed in terms of their space overhead and parameterized by the model class used for approximation. Our formulation captures a broad family of learned indexes, including most existing designs, as piecewise model-based predictors. We solve the problem of lower bounding query time in two steps: first, we use probabilistic tools to control the effect of sampling when the database attribute is drawn from a probability distribution. Then, we analyze the approximation-theoretic problem of how to optimally represent a cumulative distribution function with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Machine Learning and Algorithms · Advanced Database Systems and Queries
