Virtual Disk Snapshot Management at Scale
Kevin Nguetchouang, Theophile Dubuc, Stella Bitchebe, Alain Tchana,, Pierre Olivier

TL;DR
This paper analyzes disk snapshot usage in large-scale clouds, identifies scalability issues caused by long snapshot chains, and proposes an extended Qcow2 format with a prototype that significantly improves performance and reduces memory overhead.
Contribution
It introduces an extension to the Qcow2 format and driver in Qemu to address scalability issues caused by long snapshot chains in cloud environments.
Findings
Long snapshot chains can reach up to 1000 files.
The extended Qcow2 format improves throughput by 48%.
Memory footprint is reduced by 15 times.
Abstract
Contrary to the other resources such as CPU, memory, and network, for which virtualization is efficiently achieved through direct access, disk virtualization is peculiar. In this paper, we make four contributions. Our first contribution is the characterization of disk utilization in a public large-scale cloud infrastructure. It reveals the presence of long snapshot chains, sometimes composed of up to 1000 files. Our second contribution is to show that long chains lead to performance and memory footprint scalability issues by experimental measurements. Our third contribution is the extension of the Qcow2 format and its driver in Qemu to address the identified scalability challenges. Our fourth contribution is the thorough evaluation of our prototype, called sQemu, demonstrating that it brings significant performance enhancements and memory footprint reduction. For example, it improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Data Storage Technologies · Peer-to-Peer Network Technologies
