Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss
Sangil Jung, Changyong Son, Seohyung Lee, Jinwoo Son, Youngjun Kwak,, Jae-Joon Han, Sung Ju Hwang, Changkyu Choi

TL;DR
This paper introduces a trainable quantizer that learns optimal quantization intervals by minimizing task loss, enabling low-bit quantization of deep networks with minimal accuracy loss, even on pretrained models without training data.
Contribution
The proposed quantization-interval-learning (QIL) method optimizes quantization intervals directly for minimal task loss, achieving state-of-the-art accuracy at low bit-widths and applicability to pretrained models without data.
Findings
Maintains full-precision accuracy at 4-bit quantization.
Outperforms existing methods on ImageNet with ResNet and AlexNet.
Effective even when trained on heterogeneous datasets.
Abstract
Reducing bit-widths of activations and weights of deep networks makes it efficient to compute and store them in memory, which is crucial in their deployments to resource-limited devices, such as mobile phones. However, decreasing bit-widths with quantization generally yields drastically degraded accuracy. To tackle this problem, we propose to learn to quantize activations and weights via a trainable quantizer that transforms and discretizes them. Specifically, we parameterize the quantization intervals and obtain their optimal values by directly minimizing the task loss of the network. This quantization-interval-learning (QIL) allows the quantized networks to maintain the accuracy of the full-precision (32-bit) networks with bit-width as low as 4-bit and minimize the accuracy degeneration with further bit-width reduction (i.e., 3 and 2-bit). Moreover, our quantizer can be trained on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/
