QKD: Quantization-aware Knowledge Distillation

Jangho Kim; Yash Bhalgat; Jinwon Lee; Chirag Patel; Nojun Kwak

arXiv:1911.12491·cs.CV·December 2, 2019·46 cites

QKD: Quantization-aware Knowledge Distillation

Jangho Kim, Yash Bhalgat, Jinwon Lee, Chirag Patel, Nojun Kwak

PDF

Open Access

TL;DR

This paper introduces Quantization-aware Knowledge Distillation (QKD), a three-phase method that effectively combines quantization and KD to improve the accuracy of low-precision neural networks on resource-constrained devices.

Contribution

The paper proposes a novel three-phase QKD approach that coordinates quantization and KD, including self-studying, co-studying, and tutoring phases, to enhance low-precision neural network performance.

Findings

01

QKD outperforms existing methods with up to 2.6% accuracy improvement.

02

QKD recovers full-precision accuracy at low bit-widths (W3A3, W6A6).

03

Extensive evaluations on ImageNet and CIFAR datasets demonstrate its effectiveness.

Abstract

Quantization and Knowledge distillation (KD) methods are widely used to reduce memory and power consumption of deep neural networks (DNNs), especially for resource-constrained edge devices. Although their combination is quite promising to meet these requirements, it may not work as desired. It is mainly because the regularization effect of KD further diminishes the already reduced representation power of a quantized model. To address this short-coming, we propose Quantization-aware Knowledge Distillation (QKD) wherein quantization and KD are care-fully coordinated in three phases. First, Self-studying (SS) phase fine-tunes a quantized low-precision student network without KD to obtain a good initialization. Second, Co-studying (CS) phase tries to train a teacher to make it more quantizaion-friendly and powerful than a fixed teacher. Finally, Tutoring (TU) phase transfers knowledge from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM

MethodsKnowledge Distillation · Depthwise Convolution · Pointwise Convolution · Depthwise Separable Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Inverted Residual Block · Residual Block