Evaluating Impact of Human Errors on the Availability of Data Storage Systems
Mostafa Kishani, Reza Eftekhari, and Hossein Asadi

TL;DR
This paper assesses how human errors, specifically incorrect disk replacements, significantly impact data storage system availability, revealing that neglecting such errors can lead to substantial underestimations of system unavailability.
Contribution
It introduces a combined Monte Carlo and Markov model to quantify the impact of human errors on storage system availability, challenging traditional assumptions about RAID dependability.
Findings
Incorrect disk replacements can cause up to 1000x underestimation of unavailability.
Considering human errors, RAID1 may be less available than RAID5.
Automatic fail-over policies influence system availability significantly.
Abstract
In this paper, we investigate the effect of incorrect disk replacement service on the availability of data storage systems. To this end, we first conduct Monte Carlo simulations to evaluate the availability of disk subsystem by considering disk failures and incorrect disk replacement service. We also propose a Markov model that corroborates the Monte Carlo simulation results. We further extend the proposed model to consider the effect of automatic disk fail-over policy. The results obtained by the proposed model show that overlooking the impact of incorrect disk replacement can result up to three orders of magnitude unavailability underestimation. Moreover, this study suggests that by considering the effect of human errors, the conventional believes about the dependability of different RAID mechanisms should be revised. The results show that in the presence of human errors, RAID1 can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Cloud Computing and Resource Management · Caching and Content Delivery
