Fault Injectors for TensorFlow: Evaluation of the Impact of Random Hardware Faults on Deep CNNs
Michael Beyer, Andrey Morozov, Emil Valiev, Christoph Schorn, Lydia, Gauerhof, Kai Ding, Klaus Janschek

TL;DR
This paper introduces two fault injection frameworks for TensorFlow that evaluate the impact of random hardware faults on deep CNNs, aiding in understanding and improving fault tolerance in safety-critical AI applications.
Contribution
The paper presents novel fault injection tools for TensorFlow 1 and 2, enabling configurable fault testing on neural networks to assess their robustness against hardware faults.
Findings
Random bit flips significantly affect classification accuracy.
Certain layers and operations are more critical to network reliability.
Frameworks help identify critical components for fault tolerance.
Abstract
Today, Deep Learning (DL) enhances almost every industrial sector, including safety-critical areas. The next generation of safety standards will define appropriate verification techniques for DL-based applications and propose adequate fault tolerance mechanisms. DL-based applications, like any other software, are susceptible to common random hardware faults such as bit flips, which occur in RAM and CPU registers. Such faults can lead to silent data corruption. Therefore, it is crucial to develop methods and tools that help to evaluate how DL components operate under the presence of such faults. In this paper, we introduce two new Fault Injection (FI) frameworks InjectTF and InjectTF2 for TensorFlow 1 and TensorFlow 2, respectively. Both frameworks are available on GitHub and allow the configurable injection of random faults into Neural Networks (NN). In order to demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
