RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups
Chun-Ho Ng, Patrick P. C. Lee

TL;DR
RevDedup introduces reverse deduplication to optimize read performance for latest VM backups by shifting fragmentation to older data, achieving high deduplication efficiency and throughput.
Contribution
It proposes a novel reverse deduplication approach that improves read performance for recent backups by altering traditional deduplication strategies.
Findings
Achieves around 97% storage savings.
Provides backup and read throughput of approximately 1GB/s.
Maintains small metadata overhead.
Abstract
Scaling up the backup storage for an ever-increasing volume of virtual machine (VM) images is a critical issue in virtualization environments. While deduplication is known to effectively eliminate duplicates for VM image storage, it also introduces fragmentation that will degrade read performance. We propose RevDedup, a deduplication system that optimizes reads to latest VM image backups using an idea called reverse deduplication. In contrast with conventional deduplication that removes duplicates from new data, RevDedup removes duplicates from old data, thereby shifting fragmentation to old data while keeping the layout of new data as sequential as possible. We evaluate our RevDedup prototype using microbenchmark and real-world workloads. For a 12-week span of real-world VM images from 160 users, RevDedup achieves high deduplication efficiency with around 97% of saving, and high backup…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Caching and Content Delivery · Cloud Data Security Solutions
