Improving Quantization-aware Training of Low-Precision Network via Block   Replacement on Full-Precision Counterpart

Chengting Yu; Shu Yang; Fengzhao Zhang; Hanzhi Ma; Aili Wang; Er-Ping; Li

arXiv:2412.15846·cs.LG·December 23, 2024

Improving Quantization-aware Training of Low-Precision Network via Block Replacement on Full-Precision Counterpart

Chengting Yu, Shu Yang, Fengzhao Zhang, Hanzhi Ma, Aili Wang, Er-Ping, Li

PDF

Open Access

TL;DR

This paper introduces a novel quantization-aware training framework that uses block replacement guided by full-precision models to improve low-precision network training, achieving state-of-the-art results on standard datasets.

Contribution

The proposed method allows low-precision blocks to be guided by full-precision counterparts during training, enhancing representation and gradient estimation.

Findings

01

Achieves state-of-the-art results for 2-, 3-, and 4-bit quantization.

02

Compatible with most existing QAT methods.

03

Improves gradient estimation and representation in low-precision networks.

Abstract

Quantization-aware training (QAT) is a common paradigm for network quantization, in which the training phase incorporates the simulation of the low-precision computation to optimize the quantization parameters in alignment with the task goals. However, direct training of low-precision networks generally faces two obstacles: 1. The low-precision model exhibits limited representation capabilities and cannot directly replicate full-precision calculations, which constitutes a deficiency compared to full-precision alternatives; 2. Non-ideal deviations during gradient propagation are a common consequence of employing pseudo-gradients as approximations in derived quantized functions. In this paper, we propose a general QAT framework for alleviating the aforementioned concerns by permitting the forward and backward processes of the low-precision network to be guided by the full-precision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptical Systems and Laser Technology · Image Processing Techniques and Applications