RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training   Quantization

Hongyi Yao; Pu Li; Jian Cao; Xiangcheng Liu; Chenying Xie and; Bingzhang Wang

arXiv:2204.12322·cs.CV·September 27, 2022

RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization

Hongyi Yao, Pu Li, Jian Cao, Xiangcheng Liu, Chenying Xie and, Bingzhang Wang

PDF

Open Access 1 Repo

TL;DR

RAPQ is a novel Power-of-Two low-bit post-training quantization method that dynamically adjusts scale factors to improve neural network accuracy without retraining, suitable for efficient hardware accelerators.

Contribution

It introduces a dynamic Power-of-Two scale adjustment framework for low-bit PTQ, addressing rounding and clipping errors more effectively than static methods.

Findings

01

Achieves 65% accuracy on ResNet-18 with INT2 weights and INT4 activations.

02

Attains 48% accuracy on MobileNetV2 with the same quantization.

03

Proves Power-of-Two PTQ can match state-of-the-art accuracy with hardware-friendly constraints.

Abstract

We introduce a Power-of-Two low-bit post-training quantization(PTQ) method for deep neural network that meets hardware requirements and does not call for long-time retraining. Power-of-Two quantization can convert the multiplication introduced by quantization and dequantization to bit-shift that is adopted by many efficient accelerators. However, the Power-of-Two scale factors have fewer candidate values, which leads to more rounding or clipping errors. We propose a novel Power-of-Two PTQ framework, dubbed RAPQ, which dynamically adjusts the Power-of-Two scales of the whole network instead of statically determining them layer by layer. It can theoretically trade off the rounding error and clipping error of the whole network. Meanwhile, the reconstruction method in RAPQ is based on the BN information of every unit. Extensive experiments on ImageNet prove the excellent performance of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

billamihom/rapq
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsPointwise Convolution · Depthwise Convolution · Depthwise Separable Convolution · Batch Normalization · 1x1 Convolution · Average Pooling · Inverted Residual Block · Convolution