On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing
Sepanta Zeighami, Cyrus Shahabi

TL;DR
This paper provides a theoretical foundation for learned indexes, demonstrating they can achieve sub-logarithmic query times under mild data distribution assumptions, significantly outperforming traditional data structures.
Contribution
It proves that learned indexes can theoretically achieve $O( ext{log log } n)$ and even $O(1)$ expected query time with near-linear space, under mild distribution assumptions.
Findings
Learned indexes can answer queries in $O( ext{log log } n)$ time.
With slight space overhead, learned indexes can achieve $O(1)$ query time.
Theoretical results match empirical observations of learned indexes' performance.
Abstract
A fundamental problem in data management is to find the elements in an array that match a query. Recently, learned indexes are being extensively used to solve this problem, where they learn a model to predict the location of the items in the array. They are empirically shown to outperform non-learned methods (e.g., B-trees or binary search that answer queries in time) by orders of magnitude. However, success of learned indexes has not been theoretically justified. Only existing attempt shows the same query time of , but with a constant factor improvement in space complexity over non-learned methods, under some assumptions on data distribution. In this paper, we significantly strengthen this result, showing that under mild assumptions on data distribution, and the same space complexity as non-learned methods, learned indexes can answer queries in …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Data Management and Algorithms
