TL;DR
NICE introduces a noise injection and learned clamping method for neural network quantization, significantly improving accuracy on low-bit models and enabling efficient real-time deployment on FPGA hardware.
Contribution
The paper proposes a novel quantization training method using noise injection and learned clamping, achieving state-of-the-art results for low-bit neural networks.
Findings
Achieves high accuracy with 3-bit weights and activations on ImageNet.
Demonstrates FPGA implementation for low power real-time applications.
Outperforms previous quantization methods in accuracy and efficiency.
Abstract
Convolutional Neural Networks (CNN) are very popular in many fields including computer vision, speech recognition, natural language processing, to name a few. Though deep learning leads to groundbreaking performance in these domains, the networks used are very demanding computationally and are far from real-time even on a GPU, which is not power efficient and therefore does not suit low power systems such as mobile devices. To overcome this challenge, some solutions have been proposed for quantizing the weights and activations of these networks, which accelerate the runtime significantly. Yet, this acceleration comes at the cost of a larger error. The \uniqname method proposed in this work trains quantized neural networks by noise injection and a learned clamping, which improve the accuracy. This leads to state-of-the-art results on various regression and classification tasks, e.g.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
