Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, Yurong Chen

TL;DR
The paper introduces Incremental Network Quantization (INQ), a method to convert full-precision CNNs into low-precision models with minimal accuracy loss, enabling efficient deployment of neural networks.
Contribution
INQ is a novel iterative approach that combines weight partitioning, group-wise quantization, and re-training to achieve near-lossless low-precision CNNs.
Findings
At 5-bit quantization, models outperform 32-bit counterparts.
ResNet-18 with 2-4 bit weights maintains or improves accuracy.
Combining pruning with INQ yields further efficiency gains.
Abstract
This paper presents incremental network quantization (INQ), a novel method, targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version whose weights are constrained to be either powers of two or zero. Unlike existing methods which are struggled in noticeable accuracy loss, our INQ has the potential to resolve this issue, as benefiting from two innovations. On one hand, we introduce three interdependent operations, namely weight partition, group-wise quantization and re-training. A well-proven measure is employed to divide the weights in each layer of a pre-trained CNN model into two disjoint groups. The weights in the first group are responsible to form a low-precision base, thus they are quantized by a variable-length encoding method. The weights in the other group are responsible to compensate for the accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification
MethodsPruning · 1x1 Convolution · Convolution · Average Pooling · Local Response Normalization · Auxiliary Classifier · Inception Module · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout
