Exploring Dark Knowledge under Various Teacher Capacities and Addressing Capacity Mismatch
Wen-Shu Fan, Xin-Chun Li, De-Chuan Zhan

TL;DR
This paper investigates how teacher capacity affects dark knowledge in knowledge distillation, revealing that larger teachers may be less effective due to lower class distinction, and proposes methods to address capacity mismatch for improved student performance.
Contribution
It provides new empirical insights into the effects of teacher capacity on dark knowledge and introduces effective strategies to mitigate capacity mismatch in knowledge distillation.
Findings
Larger teachers produce less distinct probability distributions among non-ground-truth classes.
Teachers with different capacities show consistent relative class affinity.
Addressing capacity mismatch improves student model performance.
Abstract
Knowledge Distillation (KD) could transfer the ``dark knowledge" of a well-performed yet large neural network to a weaker but lightweight one. From the view of output logits and softened probabilities, this paper goes deeper into the dark knowledge provided by teachers with different capacities. Two fundamental observations are: (1) a larger teacher tends to produce probability vectors with lower distinction among non-ground-truth classes; (2) teachers with different capacities are basically consistent in their cognition of relative class affinity. Through abundant experimental studies we verify these observations and provide in-depth empirical explanations to them. We argue that the distinctness among incorrect classes embodies the essence of dark knowledge. A larger and more accurate teacher lacks this distinctness, which hampers its teaching ability compared to a smaller teacher,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducation and Critical Thinking Development
