RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization
Hongyi Yao, Pu Li, Jian Cao, Xiangcheng Liu, Chenying Xie and, Bingzhang Wang

TL;DR
RAPQ is a novel Power-of-Two low-bit post-training quantization method that dynamically adjusts scale factors to improve neural network accuracy without retraining, suitable for efficient hardware accelerators.
Contribution
It introduces a dynamic Power-of-Two scale adjustment framework for low-bit PTQ, addressing rounding and clipping errors more effectively than static methods.
Findings
Achieves 65% accuracy on ResNet-18 with INT2 weights and INT4 activations.
Attains 48% accuracy on MobileNetV2 with the same quantization.
Proves Power-of-Two PTQ can match state-of-the-art accuracy with hardware-friendly constraints.
Abstract
We introduce a Power-of-Two low-bit post-training quantization(PTQ) method for deep neural network that meets hardware requirements and does not call for long-time retraining. Power-of-Two quantization can convert the multiplication introduced by quantization and dequantization to bit-shift that is adopted by many efficient accelerators. However, the Power-of-Two scale factors have fewer candidate values, which leads to more rounding or clipping errors. We propose a novel Power-of-Two PTQ framework, dubbed RAPQ, which dynamically adjusts the Power-of-Two scales of the whole network instead of statically determining them layer by layer. It can theoretically trade off the rounding error and clipping error of the whole network. Meanwhile, the reconstruction method in RAPQ is based on the BN information of every unit. Extensive experiments on ImageNet prove the excellent performance of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsPointwise Convolution · Depthwise Convolution · Depthwise Separable Convolution · Batch Normalization · 1x1 Convolution · Average Pooling · Inverted Residual Block · Convolution
