Preparing Lessons: Improve Knowledge Distillation with Better   Supervision

Tiancheng Wen; Shenqi Lai; Xueming Qian

arXiv:1911.07471·cs.CV·July 27, 2020·20 cites

Preparing Lessons: Improve Knowledge Distillation with Better Supervision

Tiancheng Wen, Shenqi Lai, Xueming Qian

PDF

Open Access 1 Repo

TL;DR

This paper introduces two novel methods, Knowledge Adjustment and Dynamic Temperature Distillation, to enhance knowledge distillation by penalizing poor supervision, leading to improved student model performance on multiple datasets.

Contribution

The paper proposes two innovative approaches that refine supervision in knowledge distillation, outperforming existing methods and enhancing combined KD techniques.

Findings

01

Improved accuracy on CIFAR-100, CINIC-10, Tiny ImageNet

02

Effective penalization of bad supervision

03

Complementary to other KD methods

Abstract

Knowledge distillation (KD) is widely used for training a compact model with the supervision of another large model, which could effectively improve the performance. Previous methods mainly focus on two aspects: 1) training the student to mimic representation space of the teacher; 2) training the model progressively or adding extra module like discriminator. Knowledge from teacher is useful, but it is still not exactly right compared with ground truth. Besides, overly uncertain supervision also influences the result. We introduce two novel approaches, Knowledge Adjustment (KA) and Dynamic Temperature Distillation (DTD), to penalize bad supervision and improve student model. Experiments on CIFAR-100, CINIC-10 and Tiny ImageNet show that our methods get encouraging performance compared with state-of-the-art methods. When combined with other KD-based methods, the performance will be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SforAiDl/KD_Lib
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning