From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees

Yifan Dai; Yien Xu; Aishwarya Ganesan; Ramnatthan Alagappan; Brian; Kroth; Andrea C. Arpaci-Dusseau; and Remzi H. Arpaci-Dusseau

arXiv:2005.14213·cs.DB·November 3, 2020·34 cites

From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees

Yifan Dai, Yien Xu, Aishwarya Ganesan, Ramnatthan Alagappan, Brian, Kroth, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau

PDF

Open Access

TL;DR

Bourbon introduces a machine learning-based LSM tree that enhances lookup speed by learning key distributions, achieving significant performance improvements over existing systems.

Contribution

It presents BOURBON, a novel LSM tree design that uses greedy piecewise linear regression and cost-benefit strategies for improved lookup efficiency.

Findings

01

Bourbon improves lookup performance by 1.23x to 1.78x.

02

It effectively learns key distributions using regression.

03

Experimental results validate its superiority over state-of-the-art LSMs.

Abstract

We introduce BOURBON, a log-structured merge (LSM) tree that utilizes machine learning to provide fast lookups. We base the design and implementation of BOURBON on empirically-grounded principles that we derive through careful analysis of LSM design. BOURBON employs greedy piecewise linear regression to learn key distributions, enabling fast lookup with minimal computation, and applies a cost-benefit strategy to decide when learning will be worthwhile. Through a series of experiments on both synthetic and real-world datasets, we show that BOURBON improves lookup performance by 1.23x-1.78x as compared to state-of-the-art production LSMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Time Series Analysis and Forecasting · Data Mining Algorithms and Applications

MethodsLinear Regression