Learned LSM-trees: Two Approaches Using Learned Bloom Filters

Nicholas Fidalgo; Puyuan Ye

arXiv:2508.00882·cs.DS·August 5, 2025

Learned LSM-trees: Two Approaches Using Learned Bloom Filters

Nicholas Fidalgo, Puyuan Ye

PDF

Open Access

TL;DR

This paper investigates integrating machine learning models into LSM-tree data structures to reduce read latency and memory usage, demonstrating two approaches that improve efficiency while maintaining correctness.

Contribution

It introduces two novel methods for embedding learned models into LSM-trees, reducing read latency and memory footprint compared to traditional Bloom filters.

Findings

01

Classifier reduces GET latency by up to 2.28x

02

Learned Bloom filters eliminate false negatives and cut memory by 70-80%

03

Trade-offs between latency, memory, and correctness are demonstrated

Abstract

Modern key-value stores rely heavily on Log-Structured Merge (LSM) trees for write optimization, but this design introduces significant read amplification. Auxiliary structures like Bloom filters help, but impose memory costs that scale with tree depth and dataset size. Recent advances in learned data structures suggest that machine learning models can augment or replace these components, trading handcrafted heuristics for data-adaptive behavior. In this work, we explore two approaches for integrating learned predictions into the LSM-tree lookup path. The first uses a classifier to selectively bypass Bloom filter probes for irrelevant levels, aiming to reduce average-case query latency. The second replaces traditional Bloom filters with compact learned models and small backup filters, targeting memory footprint reduction without compromising correctness. We implement both methods atop a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCaching and Content Delivery · Advanced Data Storage Technologies · Cloud Computing and Resource Management