Switchable Online Knowledge Distillation
Biao Qian, Yang Wang, Hongzhi Yin, Richang Hong, Meng Wang

TL;DR
SwitOKD introduces an adaptive switching strategy during training to optimize the gap between teacher and student in online knowledge distillation, enhancing student performance while maintaining teacher quality.
Contribution
The paper proposes a novel switchable mechanism with an adaptive threshold to dynamically calibrate the distillation gap during training in online knowledge distillation.
Findings
Improves student accuracy over state-of-the-art methods.
Maintains teacher performance comparable to existing online distillation techniques.
Extends effectively to multiple network topologies.
Abstract
Online Knowledge Distillation (OKD) improves the involved models by reciprocally exploiting the difference between teacher and student. Several crucial bottlenecks over the gap between them -- e.g., Why and when does a large gap harm the performance, especially for student? How to quantify the gap between teacher and student? -- have received limited formal study. In this paper, we propose Switchable Online Knowledge Distillation (SwitOKD), to answer these questions. Instead of focusing on the accuracy gap at test phase by the existing arts, the core idea of SwitOKD is to adaptively calibrate the gap at training phase, namely distillation gap, via a switching strategy between two modes -- expert mode (pause the teacher while keep the student learning) and learning mode (restart the teacher). To possess an appropriate distillation gap, we further devise an adaptive switching threshold,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
MethodsTest · Knowledge Distillation
