The Case for Learned Index Structures

Tim Kraska; Alex Beutel; Ed H. Chi; Jeffrey Dean; Neoklis Polyzotis

arXiv:1712.01208·cs.DB·May 1, 2018

The Case for Learned Index Structures

Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis

PDF

5 Repos

TL;DR

This paper advocates replacing traditional index structures with learned models, especially neural networks, which can predict data positions more efficiently, potentially revolutionizing data management system design.

Contribution

It introduces the concept of learned indexes, analyzes their theoretical advantages, and demonstrates significant performance improvements over traditional B-Trees.

Findings

01

Neural network-based indexes outperform cache-optimized B-Trees by up to 70% in speed.

02

Learned indexes can reduce memory usage by an order of magnitude.

03

The approach shows promise across multiple real-world datasets.

Abstract

Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings