D-Rex: Heterogeneity-Aware Reliability Framework and Adaptive Algorithms for Distributed Storage
Maxime Gonthier (1, 2), Dante D. Sanchez-Gallegos (3), Haochen Pan (1), Bogdan Nicolae (2), Sicheng Zhou (4), Hai Duc Nguyen (1, 2), Valerie Hayot-Sasson (1, 2), J. Gregory Pauloski (1), Jesus Carretero (3), Kyle Chard (1, 2), Ian Foster (1, 2) ((1) University of Chicago

TL;DR
This paper introduces D-Rex, a set of adaptive algorithms designed to optimize data storage, reliability, and efficiency in heterogeneous distributed storage systems using erasure coding.
Contribution
The paper presents novel dynamic scheduling algorithms, D-Rex LB and D-Rex SC, tailored for heterogeneous environments, improving storage utilization and reliability.
Findings
D-Rex algorithms store 45% more data items than existing methods.
D-Rex SC balances storage and throughput with higher computational cost.
Greedy algorithms increase storage and throughput by 21%.
Abstract
The exponential growth of data necessitates distributed storage models, such as peer-to-peer systems and data federations. While distributed storage can reduce costs and increase reliability, the heterogeneity in storage capacity, I/O performance, and failure rates of storage resources makes their efficient use a challenge. Further, node failures are common and can lead to data unavailability and even data loss. Erasure coding is a common resiliency strategy implemented in storage systems to mitigate failures by striping data across storage locations. However, erasure coding is computationally expensive and existing systems do not consider the heterogeneous resources and their varied capacity and performance when placing data chunks. We tackle the challenges of using erasure coding with distributed and heterogeneous nodes, aiming to store as much data as possible, minimize encoding and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
