DumpKV: Learning based lifetime aware garbage collection for key value separation in LSM-tree
Zhutao Zhuang, Xinqi Zeng, Zhiguang Chen

TL;DR
DumpKV introduces a learning-based, lifetime-aware garbage collection method for LSM-trees that dynamically predicts key lifetimes to significantly reduce write amplification and improve efficiency.
Contribution
It presents a novel machine learning approach for adaptive garbage collection in LSM-trees, outperforming static methods in reducing write amplification.
Findings
Achieves 38-73% lower write amplification.
Utilizes lightweight models with minimal feature storage.
Operates efficiently during L0-L1 compaction.
Abstract
Key\-value separation is used in LSM\-tree to stored large value in separate log files to reduce write amplification, but requires garbage collection to garbage collect invalid values. Existing garbage collection techniques in LSM\-tree typically adopt static parameter based garbage collection to garbage collect obsolete values which struggles to achieve low write amplification and it's challenging to find proper parameter for garbage collection triggering. In this work we introduce DumpKV, which introduces learning based lifetime aware garbage collection with dynamic lifetime adjustment to do efficient garbage collection to achieve lower write amplification. DumpKV manages large values using trained lightweight model with features suitable for various application based on past write access information of keys to give lifetime prediction for each individual key to enable efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques
MethodsAttentive Walk-Aggregating Graph Neural Network
