Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance
Chen Tang, Kai Ouyang, Zhi Wang, Yifei Zhu, Yaowei Wang, and Wen Ji, Wenwu Zhu

TL;DR
This paper introduces a fast, importance-based method for mixed-precision neural network quantization that significantly reduces search time and achieves state-of-the-art accuracy on ImageNet.
Contribution
It proposes a joint training scheme to learn layer importance indicators and formulates the quantization search as a single ILP problem, greatly improving efficiency.
Findings
Achieves SOTA accuracy on ImageNet with various constraints.
Reduces quantization search time from hours to milliseconds.
Effectively guides mixed-precision quantization using learned importance indicators.
Abstract
The exponentially large discrete search space in mixed-precision quantization (MPQ) makes it hard to determine the optimal bit-width for each layer. Previous works usually resort to iterative search methods on the training set, which consume hundreds or even thousands of GPU-hours. In this study, we reveal that some unique learnable parameters in quantization, namely the scale factors in the quantizer, can serve as importance indicators of a layer, reflecting the contribution of that layer to the final accuracy at certain bit-widths. These importance indicators naturally perceive the numerical transformation during quantization-aware training, which can precisely provide quantization sensitivity metrics of layers. However, a deep network always contains hundreds of such indicators, and training them one by one would lead to an excessive time cost. To overcome this issue, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Advanced Vision and Imaging
