Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective
Helong Zhou, Liangchen Song, Jiajie Chen, Ye Zhou, Guoli Wang, Junsong, Yuan, Qian Zhang

TL;DR
This paper investigates the bias-variance tradeoff in knowledge distillation with soft labels, revealing sample-wise variations and proposing weighted soft labels to improve distillation performance.
Contribution
It introduces a novel perspective on bias-variance tradeoff in soft label distillation and proposes weighted soft labels for adaptive handling of sample-wise bias-variance dynamics.
Findings
Bias-variance tradeoff varies sample-wisely during training.
Distillation performance negatively correlates with the number of certain samples.
Weighted soft labels improve distillation effectiveness.
Abstract
Knowledge distillation is an effective approach to leverage a well-trained network or an ensemble of them, named as the teacher, to guide the training of a student network. The outputs from the teacher network are used as soft labels for supervising the training of a new network. Recent studies \citep{muller2019does,yuan2020revisiting} revealed an intriguing property of the soft labels that making labels soft serves as a good regularization to the student network. From the perspective of statistical learning, regularization aims to reduce the variance, however how bias and variance change is not clear for training with soft labels. In this paper, we investigate the bias-variance tradeoff brought by distillation with soft labels. Specifically, we observe that during training the bias-variance tradeoff varies sample-wisely. Further, under the same distillation temperature setting, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning
