Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation
Zheng Li, Xiang Li, Lingfeng Yang, Jian Yang, Zhigeng Pan

TL;DR
This paper introduces TESKD, a novel approach where multiple hierarchical students help improve the teacher network through shared features, leading to better performance on standard benchmarks.
Contribution
The paper proposes a new teacher evolution method via self-knowledge distillation, enabling the teacher to learn from multiple students sharing the same backbone.
Findings
ResNet-18 achieves 79.15% accuracy on CIFAR-100, outperforming baseline by 4.74%.
ResNet-18 achieves 71.14% accuracy on ImageNet, outperforming baseline by 1.43%.
The method demonstrates significant improvements across various network settings.
Abstract
Knowledge distillation usually transfers the knowledge from a pre-trained cumbersome teacher network to a compact student network, which follows the classical teacher-teaching-student paradigm. Based on this paradigm, previous methods mostly focus on how to efficiently train a better student network for deployment. Different from the existing practices, in this paper, we propose a novel student-helping-teacher formula, Teacher Evolution via Self-Knowledge Distillation (TESKD), where the target teacher (for deployment) is learned with the help of multiple hierarchical students by sharing the structural backbone. The diverse feedback from multiple students allows the teacher to improve itself through the shared feature representations. The effectiveness of our proposed framework is demonstrated by extensive experiments with various network settings on two standard benchmarks including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTechnology-Enhanced Education Studies
