Attention Round for Post-Training Quantization
Huabin Diao, Gongyan Li, Shaoyun Xu, Yuexing Hao

TL;DR
This paper introduces Attention Round, a novel post-training quantization method that improves model performance by probabilistically mapping parameters to multiple quantized values, achieving results comparable to quantization aware training with minimal data and time.
Contribution
The paper proposes Attention Round, a new quantization technique that considers all possible quantized values for each parameter, enhancing post-training quantization effectiveness.
Findings
Achieves quantization performance comparable to QAT on ResNet18 and MobileNetV2.
Requires only 1,024 data samples and 10 minutes for effective quantization.
Effectively handles mixed precision quantization using lossy coding length.
Abstract
At present, the quantification methods of neural network models are mainly divided into post-training quantization (PTQ) and quantization aware training (QAT). Post-training quantization only need a small part of the data to complete the quantification process, but the performance of its quantitative model is not as good as the quantization aware training. This paper presents a novel quantification method called Attention Round. This method gives parameters w the opportunity to be mapped to all possible quantized values, rather than just the two quantized values nearby w in the process of quantization. The probability of being mapped to different quantified values is negatively correlated with the distance between the quantified values and w, and decay with a Gaussian function. In addition, this paper uses the lossy coding length as a measure to assign bit widths to the different layers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsBatch Normalization · 1x1 Convolution · Pointwise Convolution · Depthwise Convolution · Convolution · Depthwise Separable Convolution · Average Pooling · Inverted Residual Block · Attentive Walk-Aggregating Graph Neural Network
