Capacity of Clustered Distributed Storage
Jy-yong Sohn, Beongjun Choi, Sung Whan Yoon, and Jaekyun Moon

TL;DR
This paper models the capacity of clustered distributed storage systems, analyzing how intra- and cross-cluster repair bandwidths affect reliable data storage and proposing methods to minimize cross-cluster traffic.
Contribution
It introduces a new clustered storage model that differentiates intra- and cross-cluster repair bandwidths and derives capacity expressions based on these resources.
Findings
Cross-cluster traffic can be eliminated with additional resources.
Capacity depends on storage, intra-cluster, and cross-cluster bandwidths.
Trade-offs exist between intra- and cross-cluster traffic for large storage capacities.
Abstract
A new system model reflecting the clustered structure of distributed storage is suggested to investigate bandwidth requirements for repairing failed storage nodes. Large data centers with multiple racks/disks or local networks of storage devices (e.g. sensor network) are good applications of the suggested clustered model. In realistic scenarios involving clustered storage structures, repairing storage nodes using intact nodes residing in other clusters is more bandwidth-consuming than restoring nodes based on information from intra-cluster nodes. Therefore, it is important to differentiate between intra-cluster repair bandwidth and cross-cluster repair bandwidth in modeling distributed storage. Capacity of the suggested model is obtained as a function of fundamental resources of distributed storage systems, namely, storage capacity, intra-cluster repair bandwidth and cross-cluster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
