Incremental Network Quantization: Towards Lossless CNNs with   Low-Precision Weights

Aojun Zhou; Anbang Yao; Yiwen Guo; Lin Xu; Yurong Chen

arXiv:1702.03044·cs.CV·August 28, 2017·493 cites

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, Yurong Chen

PDF

Open Access 3 Repos

TL;DR

The paper introduces Incremental Network Quantization (INQ), a method to convert full-precision CNNs into low-precision models with minimal accuracy loss, enabling efficient deployment of neural networks.

Contribution

INQ is a novel iterative approach that combines weight partitioning, group-wise quantization, and re-training to achieve near-lossless low-precision CNNs.

Findings

01

At 5-bit quantization, models outperform 32-bit counterparts.

02

ResNet-18 with 2-4 bit weights maintains or improves accuracy.

03

Combining pruning with INQ yields further efficiency gains.

Abstract

This paper presents incremental network quantization (INQ), a novel method, targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version whose weights are constrained to be either powers of two or zero. Unlike existing methods which are struggled in noticeable accuracy loss, our INQ has the potential to resolve this issue, as benefiting from two innovations. On one hand, we introduce three interdependent operations, namely weight partition, group-wise quantization and re-training. A well-proven measure is employed to divide the weights in each layer of a pre-trained CNN model into two disjoint groups. The weights in the first group are responsible to form a low-precision base, thus they are quantized by a variable-length encoding method. The weights in the other group are responsible to compensate for the accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification

MethodsPruning · 1x1 Convolution · Convolution · Average Pooling · Local Response Normalization · Auxiliary Classifier · Inception Module · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout