Can Students Beyond The Teacher? Distilling Knowledge from Teacher's Bias
Jianhua Zhang, Yi Gao, Ruyu Liu, Xu Cheng, Houxiang Zhang, Shengyong, Chen

TL;DR
This paper introduces a novel bias rectification strategy in knowledge distillation that filters and rectifies teacher biases, enabling student models to surpass their teachers in performance.
Contribution
The paper proposes a new bias elimination and rectification method in knowledge distillation, allowing students to outperform teachers by addressing bias transfer issues.
Findings
Significant performance improvements in student models.
Versatility of the method across different KD frameworks.
First approach enabling students to surpass teachers.
Abstract
Knowledge distillation (KD) is a model compression technique that transfers knowledge from a large teacher model to a smaller student model to enhance its performance. Existing methods often assume that the student model is inherently inferior to the teacher model. However, we identify that the fundamental issue affecting student performance is the bias transferred by the teacher. Current KD frameworks transmit both right and wrong knowledge, introducing bias that misleads the student model. To address this issue, we propose a novel strategy to rectify bias and greatly improve the student model's performance. Our strategy involves three steps: First, we differentiate knowledge and design a bias elimination method to filter out biases, retaining only the right knowledge for the student model to learn. Next, we propose a bias rectification method to rectify the teacher model's wrong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
