The Study of Transient Faults Propagation in Multithread Applications
Navid Khoshavi, Armin Samiei

TL;DR
This paper proposes an adaptive, application-relevant architecture for CMPs that leverages STT-RAM to improve soft-error resilience in multithreaded applications, focusing on targeted instruction reallocation.
Contribution
It introduces a novel approach using STT-RAM for critical instruction storage to enhance CMP reliability against SEUs and MBUs, with detailed analysis and targeted application strategies.
Findings
STT-RAM improves soft-error resilience compared to SRAM.
Targeted instruction reallocation enhances overall CMP reliability.
Analysis shows specific application segments benefit most from STT-RAM use.
Abstract
Whereas contemporary Error Correcting Codes (ECC) designs occupy a significant fraction of total die area in chip-multiprocessors (CMPs), approaches to deal with the vulnerability increase of CMP architecture against Single Event Upsets (SEUs) and Multi-Bit Upsets (MBUs) are sought. In this paper, we focus on reliability assessment of multithreaded applications running on CMPs to propose an adaptive application-relevant architecture design to accommodate the impact of both SEUs and MBUs in the entire CMP architecture. This work concentrates on leveraging the intrinsic soft-error-immunity feature of Spin-Transfer Torque RAM (STT-RAM) as an alternative for SRAM-based storage and operation components. We target a specific portion of working set for reallocation to improve the reliability level of the CMP architecture design. A selected portion of instructions in multithreaded program which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiation Effects in Electronics · Software System Performance and Reliability · Software Testing and Debugging Techniques
