Not All Faults Are Equal: Transient-Fault Sensitivity Characterization of an Open-Source RISC-V Vector Cluster
Maoyuan Cai, Amirhossein Kiamarzi, Davide Rossi, Angelo Garofalo

TL;DR
This study analyzes the transient-fault sensitivity of an open-source RISC-V vector cluster, revealing dominant fault manifestations, impact variations across precisions, and emphasizing targeted fault protection strategies.
Contribution
It provides a comprehensive fault sensitivity characterization of the RISC-V vector cluster under different fault models and workloads, highlighting key fault-prone areas and impact variations.
Findings
Faulty data corruption dominates manifesting errors in all workloads.
SET sensitivity is concentrated in the vector execution path; TCDM is a major contributor.
FP8 shows the lowest output impact; FP16 Widening MatMul reduces corruption spread.
Abstract
We present a transient-fault sensitivity study of the open-source RISC-V vector cluster Spatz under SET and SEU fault models. Across 100,000 fault injections on six MatMul and Widening MatMul configurations, faulty data corruption (FD) is the dominant manifesting outcome for all evaluated workloads, accounting for at least 86% of manifesting errors in the SET campaigns and at least 91% in the SEU campaigns. At the module level, SET sensitivity is concentrated in the vector execution path, while TCDM is the major contributor to FD manifestations. We further quantify SDC severity across FP32, FP16, BP16, and FP8 by analyzing both the average number of corrupted outputs and their RMSE. FP8 shows the lowest output impact overall, while FP16 Widening MatMul reduces both corruption spread and RMSE compared with FP16 MatMul. By contrast, the effect of widening on FP8 is limited in our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
