Distributed storage algorithms with optimal tradeoffs

Michael Luby; Thomas Richardson

arXiv:2101.05223·cs.IT·January 14, 2021·1 cites

Distributed storage algorithms with optimal tradeoffs

Michael Luby, Thomas Richardson

PDF

Open Access

TL;DR

This paper introduces algorithms that achieve the optimal tradeoff between network traffic and storage overhead in distributed storage systems, ensuring reliable long-term data storage despite node failures.

Contribution

The paper presents algorithms that asymptotically attain the fundamental capacity bound, establishing the optimal tradeoff between repair rate and storage overhead.

Findings

01

Algorithms achieve the theoretical capacity bound asymptotically.

02

Optimal tradeoff between network traffic and storage overhead demonstrated.

03

Provides a fundamental limit for distributed storage reliability.

Abstract

One of the primary objectives of a distributed storage system is to reliably store large amounts of source data for long durations using a large number $N$ of unreliable storage nodes, each with $c$ bits of storage capacity. Storage nodes fail randomly over time and are replaced with nodes of equal capacity initialized to zeroes, and thus bits are erased at some rate $e$ . To maintain recoverability of the source data, a repairer continually reads data over a network from nodes at an average rate $r$ , and generates and writes data to nodes based on the read data. The distributed storage source capacity is the maximum amount of source that can be reliably stored for long periods of time. Previous research shows that asymptotically the distributed storage source capacity is at most $(1 - \frac{e}{2 \cdot r}) \cdot N \cdot c$ as $N$ and $r$ grow. In this work we introduce and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Distributed systems and fault tolerance · Caching and Content Delivery