Architecture-aware Coding for Distributed Storage: Repairable Block Failure Resilient Codes
Gokhan Calis, O. Ozan Koyluoglu

TL;DR
This paper introduces Block Failure Resilient (BFR) codes for distributed storage systems, analyzing their resilience, deriving bounds, and constructing optimal codes to improve data recovery and repair efficiency in the presence of block failures.
Contribution
It proposes a new framework for block failure resilience, derives bounds, and constructs explicit codes achieving optimal trade-offs and resilience in distributed storage systems.
Findings
Derived file size bounds for repairable BFR codes
Constructed explicit BFR code examples at MSR and MBR points
Developed BFR-LRC with optimal resilience and repair efficiency
Abstract
In large scale distributed storage systems (DSS) deployed in cloud computing, correlated failures resulting in simultaneous failure (or, unavailability) of blocks of nodes are common. In such scenarios, the stored data or a content of a failed node can only be reconstructed from the available live nodes belonging to the available blocks. To analyze the resilience of the system against such block failures, this work introduces the framework of Block Failure Resilient (BFR) codes, wherein the data (e.g., a file in DSS) can be decoded by reading out from a same number of codeword symbols (nodes) from a subset of available blocks of the underlying codeword. Further, repairable BFR codes are introduced, wherein any codeword symbol in a failed block can be repaired by contacting a subset of remaining blocks in the system. File size bounds for repairable BFR codes are derived, and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
