Constructing and Analyzing the LSM Compaction Design Space (Updated   Version)

Subhadeep Sarkar; Dimitris Staratzis; Zichen Zhu; Manos Athanassoulis

arXiv:2202.04522·cs.DB·March 1, 2022

Constructing and Analyzing the LSM Compaction Design Space (Updated Version)

Subhadeep Sarkar, Dimitris Staratzis, Zichen Zhu, Manos Athanassoulis

PDF

TL;DR

This paper defines a formal design space for LSM-tree compactions, introduces four primitives to characterize strategies, and evaluates ten strategies to guide optimal design choices for performance.

Contribution

It introduces four primitives to formally define any LSM-compaction strategy and provides an experimental analysis of ten strategies to inform design decisions.

Findings

01

Identified key trade-offs in compaction strategies

02

Provided 12 observations on strategy performance

03

Suggested guidelines for navigating the compaction design space

Abstract

Log-structured merge (LSM) trees offer efficient ingestion by appending incoming data, and thus, are widely used as the storage layer of production NoSQL data stores. To enable competitive read performance, LSM-trees periodically re-organize data to form a tree with levels of exponentially increasing capacity, through iterative compactions. Compactions fundamentally influence the performance of an LSM-engine in terms of write amplification, write throughput, point and range lookup performance, space amplification, and delete performance. Hence, choosing the appropriate compaction strategy is crucial and, at the same time, hard as the LSM-compaction design space is vast, largely unexplored, and has not been formally defined in the literature. As a result, most LSM-based engines use a fixed compaction strategy, typically hand-picked by an engineer, which decides how and when to compact…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.