Repairing Multiple Failures with Coordinated and Adaptive Regenerating Codes
Anne-Marie Kermarrec, Gilles Straub, Nicolas Le Scouarnec

TL;DR
This paper introduces coordinated and adaptive regenerating codes that enable efficient simultaneous repair of multiple failures in distributed storage, improving repair flexibility and cost management.
Contribution
It presents novel coordinated and adaptive regenerating codes that support multiple failure repairs and parameter adaptation, extending the capabilities of existing regenerating codes.
Findings
Coordinated regenerating codes support simultaneous multiple device repairs.
Adaptive regenerating codes allow parameter adjustments during repair.
Lazy repairs reduce disk-related costs but not network bandwidth.
Abstract
Erasure correcting codes are widely used to ensure data persistence in distributed storage systems. This paper addresses the simultaneous repair of multiple failures in such codes. We go beyond existing work (i.e., regenerating codes by Dimakis et al.) by describing (i) coordinated regenerating codes (also known as cooperative regenerating codes) which support the simultaneous repair of multiple devices, and (ii) adaptive regenerating codes which allow adapting the parameters at each repair. Similarly to regenerating codes by Dimakis et al., these codes achieve the optimal tradeoff between storage and the repair bandwidth. Based on these extended regenerating codes, we study the impact of lazy repairs applied to regenerating codes and conclude that lazy repairs cannot reduce the costs in term of network bandwidth but allow reducing the disk-related costs (disk bandwidth and disk I/O).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed systems and fault tolerance · Caching and Content Delivery
