Endure: A Robust Tuning Paradigm for LSM Trees Under Workload Uncertainty
Andy Huynh, Harshal A. Chaudhari, Evimaria Terzi, Manos Athanassoulis

TL;DR
Endure introduces a robust tuning paradigm for LSM trees that optimizes worst-case throughput under workload uncertainty, outperforming traditional tuning strategies especially in noisy, unpredictable cloud environments.
Contribution
The paper proposes a novel robust tuning framework for LSM trees that accounts for workload variability, improving performance stability and throughput in uncertain cloud settings.
Findings
Up to 5× throughput improvement under workload uncertainty.
Robust tuning maintains near-optimal performance when workload matches expectations.
Endure outperforms classical tuning strategies across diverse noisy workloads.
Abstract
Log-Structured Merge trees (LSM trees) are increasingly used as the storage engines behind several data systems, frequently deployed in the cloud. Similar to other database architectures, LSM trees take into account information about the expected workload (e.g., reads vs. writes, point vs. range queries) to optimize their performance via tuning. Operating in shared infrastructure like the cloud, however, comes with a degree of workload uncertainty due to multi-tenancy and the fast-evolving nature of modern applications. Systems with static tuning discount the variability of such hybrid workloads and hence provide an inconsistent and overall suboptimal performance. To address this problem, we introduce Endure - a new paradigm for tuning LSM trees in the presence of workload uncertainty. Specifically, we focus on the impact of the choice of compaction policies, size-ratio, and memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Data Storage Technologies · Distributed and Parallel Computing Systems
