Adaptive Soft Error Protection for Neural Network Processing
Xinghua Xue, Cheng Liu, Feng Min, Yinhe Han

TL;DR
This paper introduces an adaptive, input-aware fault tolerance framework for neural networks that uses a lightweight GNN to predict vulnerabilities and dynamically adjust protection, reducing overhead while maintaining accuracy.
Contribution
It presents a novel vulnerability prediction method using a GNN for real-time adaptive fault tolerance in neural networks, improving efficiency over static methods.
Findings
GNN predictor achieves over 95% accuracy in identifying critical inputs and components.
Adaptive scheme reduces computational overhead by 42.12% on average.
Outperforms traditional static protection methods in efficiency while preserving accuracy.
Abstract
Previous research on selective protection for neural network components typically exploits only static vulnerability differences. Although these methods improve upon classical modular redundancy, they still incur substantial overhead for neural network workloads that are both memory-intensive and compute-intensive. In this work, we observe that neural network vulnerability is also input-dependent and varies dynamically at runtime. With this observation, we propose an adaptive, vulnerability-aware fault tolerance framework. At its core, a lightweight graph neural network (GNN) model dynamically predicts soft error vulnerabilities across inputs and neural network components, enabling real-time adaptation of fault tolerance policies. This design offers a complementary and more efficient protection scheme compared to traditional approaches. Experimental results demonstrate that the GNN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
