RevDedup: A Reverse Deduplication Storage System Optimized for Reads to   Latest Backups

Chun-Ho Ng; Patrick P. C. Lee

arXiv:1302.0621·cs.DC·June 28, 2013·23 cites

RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups

Chun-Ho Ng, Patrick P. C. Lee

PDF

Open Access

TL;DR

RevDedup introduces reverse deduplication to optimize read performance for latest VM backups by shifting fragmentation to older data, achieving high deduplication efficiency and throughput.

Contribution

It proposes a novel reverse deduplication approach that improves read performance for recent backups by altering traditional deduplication strategies.

Findings

01

Achieves around 97% storage savings.

02

Provides backup and read throughput of approximately 1GB/s.

03

Maintains small metadata overhead.

Abstract

Scaling up the backup storage for an ever-increasing volume of virtual machine (VM) images is a critical issue in virtualization environments. While deduplication is known to effectively eliminate duplicates for VM image storage, it also introduces fragmentation that will degrade read performance. We propose RevDedup, a deduplication system that optimizes reads to latest VM image backups using an idea called reverse deduplication. In contrast with conventional deduplication that removes duplicates from new data, RevDedup removes duplicates from old data, thereby shifting fragmentation to old data while keeping the layout of new data as sequential as possible. We evaluate our RevDedup prototype using microbenchmark and real-world workloads. For a 12-week span of real-world VM images from 160 users, RevDedup achieves high deduplication efficiency with around 97% of saving, and high backup…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Caching and Content Delivery · Cloud Data Security Solutions