Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation
Zechun Liu, Kwang-Ting Cheng, Dong Huang, Eric Xing and, Zhiqiang Shen

TL;DR
This paper introduces N2UQ, a quantization method that combines the accuracy of nonuniform quantization with the efficiency of uniform quantization, using a novel G-STE for training and entropy regularization.
Contribution
N2UQ maintains nonuniform quantization accuracy while being hardware-friendly, utilizing learnable thresholds and a generalized straight-through estimator for effective training.
Findings
N2UQ outperforms state-of-the-art nonuniform methods by 0.5-1.7% on ImageNet.
The method achieves high accuracy with uniform weights and activations.
Efficient and hardware-friendly quantization approach demonstrated.
Abstract
The nonuniform quantization strategy for compressing neural networks usually achieves better performance than its counterpart, i.e., uniform strategy, due to its superior representational capacity. However, many nonuniform quantization methods overlook the complicated projection process in implementing the nonuniformly quantized weights/activations, which incurs non-negligible time and space overhead in hardware deployment. In this study, we propose Nonuniform-to-Uniform Quantization (N2UQ), a method that can maintain the strong representation ability of nonuniform methods while being hardware-friendly and efficient as the uniform quantization for model inference. We achieve this through learning the flexible in-equidistant input thresholds to better fit the underlying distribution while quantizing these real-valued inputs into equidistant output levels. To train the quantized network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Image Enhancement Techniques
