FAT: Training Neural Networks for Reliable Inference Under Hardware   Faults

Ussama Zahid; Giulio Gambardella; Nicholas J. Fraser; Michaela Blott,; Kees Vissers

arXiv:2011.05873·cs.LG·November 12, 2020

FAT: Training Neural Networks for Reliable Inference Under Hardware Faults

Ussama Zahid, Giulio Gambardella, Nicholas J. Fraser, Michaela Blott,, Kees Vissers

PDF

TL;DR

This paper introduces fault-aware training (FAT), a novel method that injects faults during neural network training to improve the fault tolerance of quantized neural networks, especially for safety-critical embedded applications.

Contribution

The paper proposes a new fault-aware training methodology that enhances the fault resilience of quantized neural networks by incorporating error modeling during training.

Findings

01

Fault injection during training improves error tolerance of CNNs.

02

Redundant systems from FAT-trained QNNs achieve higher worst-case accuracy.

03

Validated on CIFAR10, GTSRB, SVHN, and ImageNet datasets.

Abstract

Deep neural networks (DNNs) are state-of-the-art algorithms for multiple applications, spanning from image classification to speech recognition. While providing excellent accuracy, they often have enormous compute and memory requirements. As a result of this, quantized neural networks (QNNs) are increasingly being adopted and deployed especially on embedded devices, thanks to their high accuracy, but also since they have significantly lower compute and memory requirements compared to their floating point equivalents. QNN deployment is also being evaluated for safety-critical applications, such as automotive, avionics, medical or industrial. These systems require functional safety, guaranteeing failure-free behaviour even in the presence of hardware faults. In general fault tolerance can be achieved by adding redundancy to the system, which further exacerbates the overall computational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.