Learning to Quantize Deep Networks by Optimizing Quantization Intervals   with Task Loss

Sangil Jung; Changyong Son; Seohyung Lee; Jinwoo Son; Youngjun Kwak,; Jae-Joon Han; Sung Ju Hwang; Changkyu Choi

arXiv:1808.05779·cs.CV·November 26, 2018

Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss

Sangil Jung, Changyong Son, Seohyung Lee, Jinwoo Son, Youngjun Kwak,, Jae-Joon Han, Sung Ju Hwang, Changkyu Choi

PDF

TL;DR

This paper introduces a trainable quantizer that learns optimal quantization intervals by minimizing task loss, enabling low-bit quantization of deep networks with minimal accuracy loss, even on pretrained models without training data.

Contribution

The proposed quantization-interval-learning (QIL) method optimizes quantization intervals directly for minimal task loss, achieving state-of-the-art accuracy at low bit-widths and applicability to pretrained models without data.

Findings

01

Maintains full-precision accuracy at 4-bit quantization.

02

Outperforms existing methods on ImageNet with ResNet and AlexNet.

03

Effective even when trained on heterogeneous datasets.

Abstract

Reducing bit-widths of activations and weights of deep networks makes it efficient to compute and store them in memory, which is crucial in their deployments to resource-limited devices, such as mobile phones. However, decreasing bit-widths with quantization generally yields drastically degraded accuracy. To tackle this problem, we propose to learn to quantize activations and weights via a trainable quantizer that transforms and discretizes them. Specifically, we parameterize the quantization intervals and obtain their optimal values by directly minimizing the task loss of the network. This quantization-interval-learning (QIL) allows the quantized networks to maintain the accuracy of the full-precision (32-bit) networks with bit-width as low as 4-bit and minimize the accuracy degeneration with further bit-width reduction (i.e., 3 and 2-bit). Moreover, our quantizer can be trained on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/