Optimizing Apportionment of Redundancies in Hierarchical RAID
Alexander Thomasian

TL;DR
This paper investigates how to optimally distribute redundancy in hierarchical RAID systems to maximize data reliability, revealing that intra-SN redundancy contributes more to reliability than inter-SN redundancy.
Contribution
It introduces an approximate reliability analysis combined with Monte-Carlo simulation to optimize redundancy apportionment in hierarchical RAID systems.
Findings
Higher intra-SN redundancy improves Mean-Time-to-Data-Loss.
Optimal redundancy distribution favors intra-SN over inter-SN levels.
Results challenge previous assumptions from IBM studies.
Abstract
Large disk arrays are organized into storage nodes -- SNs or bricks with their own cashed RAID controller for multiple disks. Erasure coding at SN level is attained via parity or Reed-Solomon codes. Hierarchical RAID -- HRAID -- provides an additional level of coding across SNs, e.g., check strips P, Q at intra-SN level and R at the inter-SN level. Failed disks and SNs are not replaced and rebuild is accomplished by restriping, e.g., overwriting P and Q for disk failures and R for an SN failure. For a given total redundancy level we use an approximate reliability analysis method and Monte-Carlo simulation to explore the better apportionment of check blocks for intra- vs inter-SN redundancy. Our study indicates that a higher MTTDL -- Mean-Time-to-Data-Loss -- is attained by associating higher reliability at intra-SN level rather than inter-SN level, which is contrary to that of an IBM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Simulation Techniques and Applications
