Evaluating the Effectiveness of Microarchitectural Hardware Fault Detection for Application-Specific Requirements
Konstantinos-Nikolaos Papadopoulos, Christina Giannoula,, Nikolaos-Charalampos Papadopoulos, Nektarios Koziris, Jos\'e M.G. Merayo,, Dionisios N. Pnevmatikatos

TL;DR
This paper compares three hardware fault detection methods in processors, analyzing their effectiveness and trade-offs across diverse safety-critical applications to guide practical fault tolerance design.
Contribution
It provides a comprehensive evaluation of DMR, R-SMT, and ParDet methods considering multiple metrics and real application requirements, highlighting their suitability and limitations.
Findings
Microarchitectural methods are comparably robust to DMR.
Trade-offs vary significantly depending on application needs.
Certain microarchitectural methods are unsuitable for specific scenarios.
Abstract
Reliability is necessary in safety-critical applications spanning numerous domains. Conventional hardware-based fault tolerance techniques, such as component redundancy, ensure reliability, typically at the expense of significantly increased power consumption, and almost double (or more) hardware area. To mitigate these costs, microarchitectural fault tolerance methods try to lower overheads by leveraging microarchitectural insights, but prior evaluations focus primarily on only application performance. As different safety-critical applications prioritize different requirements beyond reliability, evaluating only limited metrics cannot guarantee that microarchitectural methods are practical and usable for all different application scenarios. To this end, in this work, we extensively characterize and compare three fault detection methods, each representing a different major fault…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization · Industrial Vision Systems and Defect Detection · Radiation Effects in Electronics
