Mycelium: A Transformation-Embedded LSM-Tree
Holly Casaletto, Jeff Lefevre, Aldrin Montana, Peter Alvaro

TL;DR
This paper introduces TE-LSM, a novel data structure that embeds data transformations into the compaction process of LSM-trees, significantly reducing overhead and improving read latency.
Contribution
We propose Transformation-Embedded LSM-trees (TE-LSM), enabling data transformations during compaction to optimize performance and prepare data for future access patterns.
Findings
Mycelium incurs only 20% write throughput overhead.
Achieves up to 425% improvements in read latency.
Effectively integrates data transformations into compaction process.
Abstract
Compaction is a necessary, but often costly background process in write-optimized data structures like LSM-trees that reorganizes incoming data that is sequentially appended to logs. In this paper, we introduce Transformation-Embedded LSM-trees (TE-LSM), a novel approach that transparently embeds a variety of data transformations into the compaction process. While many others have sought to reduce the high cost of compaction, TE-LSMs leverage the opportunity to embed other useful work to amortize IO costs and amplification. We illustrate the use of a TE-LSM in Mycelium, our prototype built on top of RocksDB that extends the compaction process through a cross-column-family merging mechanism. Mycelium enables seamless integration of a transformer interface and aims to better prepare data for future accesses based on access patterns. We use Mycelium to explore three types of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Parallel Computing and Optimization Techniques · Slime Mold and Myxomycetes Research
