Dependability Analysis of Data Storage Systems in Presence of Soft Errors
Mostafa Kishani, Mehdi Tahoori, Hossein Asadi

TL;DR
This study investigates how soft errors in storage system controllers affect overall system dependability, introducing a new vulnerability metric and revealing significant impacts on data loss and unavailability.
Contribution
It is the first comprehensive system-level analysis of soft error effects on data storage system dependability, including a novel vulnerability metric and extensive fault injection experiments.
Findings
Up to 40% of cache memory can contain end-user data vulnerable to soft errors.
Soft errors in cache data cause data loss, while errors in cache tags lead to data unavailability.
Detectable errors are primary causes of data unavailability, silent corruptions mainly cause data loss.
Abstract
In recent years, high availability and reliability of Data Storage Systems (DSS) have been significantly threatened by soft errors occurring in storage controllers. Due to their specific functionality and hardware-software stack, error propagation and manifestation in DSS is quite different from general-purpose computing architectures. To our knowledge, no previous study has examined the system-level effects of soft errors on the availability and reliability of data storage systems. In this paper, we first analyze the effects of soft errors occurring in the server processors of storage controllers on the entire storage system dependability. To this end, we implemented the major functions of a typical data storage system controller, running on a full stack of storage system operating system, and developed a framework to perform fault injection experiments using a full system simulator.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
