Dual-Forward Path Teacher Knowledge Distillation: Bridging the Capacity Gap Between Teacher and Student
Tong Li, Long Liu, Yihang Hu, Hu Chen, and Shifeng Chen

TL;DR
This paper introduces DFPT-KD, a novel knowledge distillation method using a dual-forward path teacher and prompt-based tuning to better bridge the capacity gap between teacher and student networks, leading to improved performance.
Contribution
It proposes a dual-forward path teacher with prompt-based tuning for more effective knowledge transfer, surpassing traditional methods in handling capacity gaps.
Findings
DFPT-KD outperforms vanilla KD in student performance.
Fine-tuning the prompt-based forward path (DFPT-KD+) further enhances accuracy.
DFPT-KD+ achieves state-of-the-art results in knowledge distillation.
Abstract
Knowledge distillation (KD) provides an effective way to improve the performance of a student network under the guidance of pre-trained teachers. However, this approach usually brings in a large capacity gap between teacher and student networks, limiting the distillation gains. Previous methods addressing this problem either discard accurate knowledge representation or fail to dynamically adjust the transferred knowledge, which is less effective in addressing the capacity gap problem and hinders students from achieving comparable performance with the pre-trained teacher. In this work, we extend the ideology of prompt-based learning to address the capacity gap problem, and propose Dual-Forward Path Teacher Knowledge Distillation (DFPT-KD), which replaces the pre-trained teacher with a novel dual-forward path teacher to supervise the learning of student. The key to DFPT-KD is prompt-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
