Learning to Teach with Student Feedback
Yitao Liu, Tianxiang Sun, Xipeng Qiu, Xuanjing Huang

TL;DR
This paper introduces Interactive Knowledge Distillation (IKD), a novel framework where the teacher learns from student feedback to generate tailored soft targets, improving model compression in NLP tasks.
Contribution
The paper proposes a new interactive framework for knowledge distillation that enables the teacher to adapt based on student feedback, enhancing training effectiveness.
Findings
IKD outperforms traditional KD methods on NLP tasks.
The iterative training process improves student model accuracy.
Teacher adapts dynamically to student training progress.
Abstract
Knowledge distillation (KD) has gained much attention due to its effectiveness in compressing large-scale pre-trained models. In typical KD methods, the small student model is trained to match the soft targets generated by the big teacher model. However, the interaction between student and teacher is one-way. The teacher is usually fixed once trained, resulting in static soft targets to be distilled. This one-way interaction leads to the teacher's inability to perceive the characteristics of the student and its training progress. To address this issue, we propose Interactive Knowledge Distillation (IKD), which also allows the teacher to learn to teach from the feedback of the student. In particular, IKD trains the teacher model to generate specific soft target at each training step for a certain student. Joint optimization for both teacher and student is achieved by two iterative steps:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)
MethodsKnowledge Distillation
