Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN)
Jungwook Choi, Pierce I-Jen Chuang, Zhuo Wang, Swagath Venkataramani,, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan

TL;DR
This paper introduces novel techniques for 2-bit quantization of neural networks, achieving accuracy comparable to full-precision models by separately optimizing weight and activation quantizations.
Contribution
It presents PACT and SAWB, two methods for activation and weight quantization, respectively, that together enable high-accuracy 2-bit QNNs without exhaustive search.
Findings
Achieves state-of-the-art accuracy for 2-bit QNNs.
PACT optimizes activation clipping during training.
SAWB minimizes quantization error based on weight statistics.
Abstract
Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. In order to reduce this cost, several quantization schemes have gained attention recently with some focusing on weight quantization, and others focusing on quantizing activations. This paper proposes novel techniques that target weight and activation quantizations separately resulting in an overall quantized neural network (QNN). The activation quantization technique, PArameterized Clipping acTivation (PACT), uses an activation clipping parameter that is optimized during training to find the right quantization scale. The weight quantization scheme, statistics-aware weight binning (SAWB), finds the optimal scaling factor that minimizes the quantization error based on the statistical characteristics of the distribution of weights without the need for an exhaustive search.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
