Knowledge Distillation in Generations: More Tolerant Teachers Educate   Better Students

Chenglin Yang; Lingxi Xie; Siyuan Qiao; Alan Yuille

arXiv:1805.05551·cs.CV·September 10, 2018·67 cites

Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students

Chenglin Yang, Lingxi Xie, Siyuan Qiao, Alan Yuille

PDF

Open Access

TL;DR

This paper proposes that training more tolerant, less strict teacher networks with softer supervision signals improves the performance of student networks in generational deep learning, leading to higher accuracy.

Contribution

Introducing a simple method to make teacher networks more tolerant by adding an extra loss term, which enhances student learning and overall accuracy.

Findings

01

Tolerant teachers produce better students in generational training.

02

Students outperform competitors despite less powerful teachers.

03

Method improves accuracy on CIFAR100 and ILSVRC2012.

Abstract

We focus on the problem of training a deep neural network in generations. The flowchart is that, in order to optimize the target network (student), another network (teacher) with the same architecture is first trained, and used to provide part of supervision signals in the next stage. While this strategy leads to a higher accuracy, many aspects (e.g., why teacher-student optimization helps) still need further explorations. This paper studies this problem from a perspective of controlling the strictness in training the teacher network. Existing approaches mostly used a hard distribution (e.g., one-hot vectors) in training, leading to a strict teacher which itself has a high accuracy, but we argue that the teacher needs to be more tolerant, although this often implies a lower accuracy. The implementation is very easy, with merely an extra loss term added to the teacher network,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOnline Learning and Analytics · Teaching and Learning Programming · Educational Leadership and Innovation