Constructing and Analyzing the LSM Compaction Design Space (Updated Version)
Subhadeep Sarkar, Dimitris Staratzis, Zichen Zhu, Manos Athanassoulis

TL;DR
This paper defines a formal design space for LSM-tree compactions, introduces four primitives to characterize strategies, and evaluates ten strategies to guide optimal design choices for performance.
Contribution
It introduces four primitives to formally define any LSM-compaction strategy and provides an experimental analysis of ten strategies to inform design decisions.
Findings
Identified key trade-offs in compaction strategies
Provided 12 observations on strategy performance
Suggested guidelines for navigating the compaction design space
Abstract
Log-structured merge (LSM) trees offer efficient ingestion by appending incoming data, and thus, are widely used as the storage layer of production NoSQL data stores. To enable competitive read performance, LSM-trees periodically re-organize data to form a tree with levels of exponentially increasing capacity, through iterative compactions. Compactions fundamentally influence the performance of an LSM-engine in terms of write amplification, write throughput, point and range lookup performance, space amplification, and delete performance. Hence, choosing the appropriate compaction strategy is crucial and, at the same time, hard as the LSM-compaction design space is vast, largely unexplored, and has not been formally defined in the literature. As a result, most LSM-based engines use a fixed compaction strategy, typically hand-picked by an engineer, which decides how and when to compact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
