Improving Quantization-aware Training of Low-Precision Network via Block Replacement on Full-Precision Counterpart
Chengting Yu, Shu Yang, Fengzhao Zhang, Hanzhi Ma, Aili Wang, Er-Ping, Li

TL;DR
This paper introduces a novel quantization-aware training framework that uses block replacement guided by full-precision models to improve low-precision network training, achieving state-of-the-art results on standard datasets.
Contribution
The proposed method allows low-precision blocks to be guided by full-precision counterparts during training, enhancing representation and gradient estimation.
Findings
Achieves state-of-the-art results for 2-, 3-, and 4-bit quantization.
Compatible with most existing QAT methods.
Improves gradient estimation and representation in low-precision networks.
Abstract
Quantization-aware training (QAT) is a common paradigm for network quantization, in which the training phase incorporates the simulation of the low-precision computation to optimize the quantization parameters in alignment with the task goals. However, direct training of low-precision networks generally faces two obstacles: 1. The low-precision model exhibits limited representation capabilities and cannot directly replicate full-precision calculations, which constitutes a deficiency compared to full-precision alternatives; 2. Non-ideal deviations during gradient propagation are a common consequence of employing pseudo-gradients as approximations in derived quantized functions. In this paper, we propose a general QAT framework for alleviating the aforementioned concerns by permitting the forward and backward processes of the low-precision network to be guided by the full-precision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptical Systems and Laser Technology · Image Processing Techniques and Applications
