Centralized Multi-Node Repair Regenerating Codes

Marwen Zorgui; Zhiying Wang

arXiv:1706.05431·cs.IT·May 9, 2018·6 cites

Centralized Multi-Node Repair Regenerating Codes

Marwen Zorgui, Zhiying Wang

PDF

Open Access

TL;DR

This paper investigates the fundamental tradeoffs in repairing multiple node failures in distributed storage, proposing optimal strategies, code conversions, and proving limitations for exact repair scenarios.

Contribution

It establishes the tradeoff between repair bandwidth and storage for multi-node failures, derives closed-form solutions, and introduces a framework for converting single failure codes to multi-node repair codes.

Findings

01

Optimal tradeoff between repair bandwidth and storage size identified.

02

Framework for converting single erasure codes to multi-node repair codes proposed.

03

Functional MBMR point is not achievable for linear exact repair codes.

Abstract

In a distributed storage system, recovering from multiple failures is a critical and frequent task that is crucial for maintaining the system's reliability and fault-tolerance. In this work, we focus on the problem of repairing multiple failures in a centralized way, which can be desirable in many data storage configurations, and we show that a significant repair traffic reduction is possible. First, the fundamental tradeoff between the repair bandwidth and the storage size for functional repair is established. Using a graph-theoretic formulation, the optimal tradeoff is identified as the solution to an integer optimization problem, for which a closed-form expression is derived. Expressions of the extreme points, namely the minimum storage multi-node repair (MSMR) and minimum bandwidth multi-node repair (MBMR) points, are obtained. Second, we describe a general framework for converting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Caching and Content Delivery · Distributed systems and fault tolerance