On Minimizing Data-read and Download for Storage-Node Recovery
Nihar B. Shah

TL;DR
This paper investigates the limits of data-read and download efficiency in distributed storage recovery, proving bounds and demonstrating conditions under which these bounds can be simultaneously achieved.
Contribution
It completes the theoretical understanding of data-read and download bounds for node recovery, especially when the number of helper nodes is less than n-1.
Findings
Lower bounds on read and download are loose when d<n-1.
Converse proof applies to non-linear codes.
Bounds can be simultaneously met under practical relaxations.
Abstract
We consider the problem of efficient recovery of the data stored in any individual node of a distributed storage system, from the rest of the nodes. Applications include handling failures and degraded reads. We measure efficiency in terms of the amount of data-read and the download required. To minimize the download, we focus on the minimum bandwidth setting of the 'regenerating codes' model for distributed storage. Under this model, the system has a total of n nodes, and the data stored in any node must be (efficiently) recoverable from any d of the other (n-1) nodes. Lower bounds on the two metrics under this model were derived previously; it has also been shown that these bounds are achievable for the amount of data-read and download when d=n-1, and for the amount of download alone when d<n-1. In this paper, we complete this picture by proving the converse result, that when d<n-1,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
