DILI: A Distribution-Driven Learned Index (Extended version)
Pengfei Li, Hua Lu, Rong Zhu, Bolin Ding, Long Yang, Gang Pan

TL;DR
DILI introduces a novel distribution-driven learned index structure using linear regression models at each node, optimizing search efficiency for in-memory one-dimensional keys by balancing tree height and node count.
Contribution
It proposes a new index tree that leverages data distribution and machine learning models for improved search performance and dynamic updates.
Findings
DILI outperforms existing learned indexes on various workloads.
The index effectively balances tree height and node count for faster searches.
Efficient insert and delete algorithms maintain performance during updates.
Abstract
Targeting in-memory one-dimensional search keys, we propose a novel DIstribution-driven Learned Index tree (DILI), where a concise and computation-efficient linear regression model is used for each node. An internal node's key range is equally divided by its child nodes such that a key search enjoys perfect model prediction accuracy to find the relevant leaf node. A leaf node uses machine learning models to generate searchable data layout and thus accurately predicts the data record position for a key. To construct DILI, we first build a bottom-up tree with linear regression models according to global and local key distributions. Using the bottom-up tree, we build DILI in a top-down manner, individualizing the fanouts for internal nodes according to local distributions. DILI strikes a good balance between the number of leaf nodes and the height of the tree, two critical factors of key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
