Increasing Availability in Distributed Storage Systems via Clustering

Saeid Sahraei; Michael Gastpar

arXiv:1710.02653·cs.IT·March 6, 2019

Increasing Availability in Distributed Storage Systems via Clustering

Saeid Sahraei, Michael Gastpar

PDF

TL;DR

This paper proposes the Fixed Cluster Repair System (FCRS), a new distributed storage architecture that improves repair bandwidth efficiency while maintaining high availability through clustering, and introduces Cubic Codes for optimal repair under various models.

Contribution

It introduces FCRS architecture for distributed storage, characterizes its repair bandwidth trade-offs, and designs Cubic Codes that are optimal for minimizing repair bandwidth.

Findings

01

FCRS guarantees availability of s-1 with small repair bandwidth.

02

Cubic Codes achieve up to 0.79 times the repair bandwidth of existing codes.

03

Cubic Codes are optimal for 2 and 3 clusters and under repair-by-transfer model.

Abstract

We introduce the Fixed Cluster Repair System (FCRS) as a novel architecture for Distributed Storage Systems (DSS), achieving a small repair bandwidth while guaranteeing a high availability. Specifically we partition the set of servers in a DSS into $s$ clusters and allow a failed server to choose any cluster other than its own as its repair group. Thereby, we guarantee an availability of $s - 1$ . We characterize the repair bandwidth vs. storage trade-off for the FCRS under functional repair and show that the minimum repair bandwidth can be improved by an asymptotic multiplicative factor of $2/3$ compared to the state of the art coding techniques that guarantee the same availability. We further introduce Cubic Codes designed to minimize the repair bandwidth of the FCRS under the exact repair model. We prove an asymptotic multiplicative improvement of $0.79$ in the minimum repair bandwidth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.