On the Analysis of Reed Solomon Coding for Resilience to Transient/Permanent Faults in Highly Reliable Memories
L. Schiano, M. Ottavi, F. Lombardi, S. Pontarelli, A. Salsano

TL;DR
This paper compares memory systems using Reed Solomon coding with scrubbing and duplex arrangements, analyzing their effectiveness in handling permanent and transient faults through novel Markov chain models.
Contribution
It introduces a novel Markov chain analysis for Reed Solomon coded memory systems and evaluates their resilience strategies against faults.
Findings
Duplex arrangements efficiently handle permanent faults.
Scrubbing effectively mitigates transient faults.
Markov chain models provide insights into system performance.
Abstract
Single Event Upsets (SEU) as well as permanent faults can significantly affect the correct on-line operation of digital systems, such as memories and microprocessors; a memory can be made resilient to permanent and transient faults by using modular redundancy and coding. In this paper, different memory systems are compared: these systems utilize simplex and duplex arrangements with a combination of Reed Solomon coding and scrubbing. The memory systems and their operations are analyzed by novel Markov chains to characterize performance for dynamic reconfiguration as well as error detection and correction under the occurrence of permanent and transient faults. For a specific Reed Solomon code, the duplex arrangement allows to efficiently cope with the occurrence of permanent faults, while the use of scrubbing allows to cope with transient faults.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiation Effects in Electronics · Distributed systems and fault tolerance · Interconnection Networks and Systems
