# Clustered Distributed Data Storage Repairing Multiple Failures

**Authors:** Shiqiu Liu, Fangwei Ye, Qihui Wu

PMC · DOI: 10.3390/e27030313 · Entropy · 2025-03-17

## TL;DR

This paper studies how to efficiently repair multiple node failures in clustered storage systems by leveraging cheaper intra-cluster communication.

## Contribution

The paper derives a storage-repair bandwidth trade-off and provides explicit code constructions for clustered DSSs.

## Key findings

- Failed nodes can repair more efficiently by downloading data from the same cluster.
- Collaboration among failed nodes in the same cluster reduces repair bandwidth.
- Explicit codes achieve the minimum storage and minimum bandwidth repair points.

## Abstract

A clustered distributed storage system (DSS), also called a rack-aware storage system, is a distributed storage system in which the nodes are grouped into several clusters. The communication between two clusters may be restricted by their connectivity; that is to say, the communication cost between nodes differs depending on their location. As such, when repairing a failed node, downloading data from nodes that are in the same cluster is much cheaper and more efficient than downloading data from nodes in another cluster. In this article, we consider a scenario in which the failed nodes only download data from nodes in the same cluster, which is an extreme and important case that leverages the fact that the intra-cluster bandwidth is much cheaper than the cross-cluster repair bandwidth. Also, we study the problem of repairing multiple failures in this article, which allows for collaboration within the same cluster, i.e., failed nodes in the same cluster can exchange data with each other. We derive the trade-off between the storage and repair bandwidth for the clustered DSSs and provide explicit code constructions achieving two extreme points in the trade-off, namely the minimum storage clustered collaborative repair (MSCCR) point and the minimum bandwidth clustered collaborative repair (MBCCR) point, respectively.

## Full-text entities

- **Diseases:** injury to (MESH:D014947)
- **Chemicals:** DSS (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11941202/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11941202/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/PMC11941202/full.md

---
Source: https://tomesphere.com/paper/PMC11941202