Keigo: Co-designing Log-Structured Merge Key-Value Stores with a Non-Volatile, Concurrency-aware Storage Hierarchy (Extended Version)
R\'uben Ad\~ao, Zhongjie Wu, Changjun Zhou, Oana Balmau, Jo\~ao Paulo, Ricardo Macedo

TL;DR
Keigo is a middleware that optimizes log-structured merge key-value stores by intelligently placing data across heterogeneous storage devices based on workload characteristics, significantly improving throughput.
Contribution
Keigo introduces a novel, portable, and workload-aware storage middleware that enhances LSM KVS performance on heterogeneous storage hierarchies without extensive profiling.
Findings
Up to 4x throughput improvement for writes.
Up to 18x throughput improvement for reads.
Effective across multiple production KVS like RocksDB and LevelDB.
Abstract
We present Keigo, a concurrency- and workload-aware storage middleware that enhances the performance of log-structured merge key-value stores (LSM KVS) when they are deployed on a hierarchy of storage devices. The key observation behind Keigo is that there is no one-size-fits-all placement of data across the storage hierarchy that optimizes for all workloads. Hence, to leverage the benefits of combining different storage devices, Keigo places files across different devices based on their parallelism, I/O bandwidth, and capacity. We introduce three techniques - concurrency-aware data placement, persistent read-only caching, and context-based I/O differentiation. Keigo is portable across different LSMs, is adaptable to dynamic workloads, and does not require extensive profiling. Our system enables established production KVS such as RocksDB, LevelDB, and Speedb to benefit from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed systems and fault tolerance · Caching and Content Delivery
