A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster
K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba, Borthakur, and Kannan Ramchandran

TL;DR
This paper studies the network impact of erasure-coded data recovery in Facebook's data center and proposes a new piggybacked code that significantly reduces network traffic during recovery.
Contribution
It provides the first measurement-based analysis of erasure code recovery impact in a real data center and introduces a novel piggybacking code to reduce network usage.
Findings
Recovery of RS-coded data causes over 100 TB/day network traffic.
Proposed piggybacking code reduces recovery network usage by 30%.
Implementation in HDFS could save nearly 50 TB/day cross-rack traffic.
Abstract
Erasure codes, such as Reed-Solomon (RS) codes, are being increasingly employed in data centers to combat the cost of reliably storing large amounts of data. Although these codes provide optimal storage efficiency, they require significantly high network and disk usage during recovery of missing data. In this paper, we first present a study on the impact of recovery operations of erasure-coded data on the data-center network, based on measurements from Facebook's warehouse cluster in production. To the best of our knowledge, this is the first study of its kind available in the literature. Our study reveals that recovery of RS-coded data results in a significant increase in network traffic, more than a hundred terabytes per day, in a cluster storing multiple petabytes of RS-coded data. To address this issue, we present a new storage code using our recently proposed "Piggybacking"…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Caching and Content Delivery · Peer-to-Peer Network Technologies
