One Weight Bitwidth to Rule Them All
Ting-Wu Chin, Pierce I-Jen Chuang, Vikas Chandra, Diana Marculescu

TL;DR
This paper demonstrates that using a single weight bitwidth across a neural network can outperform mixed-precision quantization in terms of accuracy when both are constrained to the same model size, simplifying deployment.
Contribution
It introduces the insight that a uniform weight bitwidth can be more effective than mixed-precision quantization under size constraints, challenging common assumptions.
Findings
Single bitwidth can outperform mixed-precision at same model size.
Uniform bitwidth simplifies hardware/software support.
Optimality depends on model size and channel count.
Abstract
Weight quantization for deep ConvNets has shown promising results for applications such as image classification and semantic segmentation and is especially important for applications where memory storage is limited. However, when aiming for quantization without accuracy degradation, different tasks may end up with different bitwidths. This creates complexity for software and hardware support and the complexity accumulates when one considers mixed-precision quantization, in which case each layer's weights use a different bitwidth. Our key insight is that optimizing for the least bitwidth subject to no accuracy degradation is not necessarily an optimal strategy. This is because one cannot decide optimality between two bitwidths if one has a smaller model size while the other has better accuracy. In this work, we take the first step to understand if some weight bitwidth is better than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
