Efficient Error-Tolerant Quantized Neural Network Accelerators
Giulio Gambardella, Johannes Kappauf, Michaela Blott, Christoph, Doehring, Martin Kumm, Peter Zipf, Kees Vissers

TL;DR
This paper evaluates the fault tolerance of quantized neural networks in hardware accelerators, revealing their vulnerability and proposing methods to improve robustness with minimal redundancy.
Contribution
It introduces a methodology using FPGA-based error injection to assess fault impact on QNNs and proposes two novel fault mitigation techniques.
Findings
QNNs with convolutional layers are less fault-tolerant than previously believed.
Faults can cause accuracy drops of up to 10%.
Proposed methods improve robustness with less redundancy.
Abstract
Neural Networks are currently one of the most widely deployed machine learning algorithms. In particular, Convolutional Neural Networks (CNNs), are gaining popularity and are evaluated for deployment in safety critical applications such as self driving vehicles. Modern CNNs feature enormous memory bandwidth and high computational needs, challenging existing hardware platforms to meet throughput, latency and power requirements. Functional safety and error tolerance need to be considered as additional requirement in safety critical systems. In general, fault tolerant operation can be achieved by adding redundancy to the system, which is further exacerbating the computational demands. Furthermore, the question arises whether pruning and quantization methods for performance scaling turn out to be counterproductive with regards to fail safety requirements. In this work we present a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning
