Gap Preserving Distillation by Building Bidirectional Mappings with A Dynamic Teacher
Yong Guo, Shulian Zhang, Haolin Pan, Jing Liu, Yulun Zhang, Jian Chen

TL;DR
This paper introduces Gap Preserving Distillation (GPD), a novel method that dynamically builds a teacher model during training to maintain a manageable performance gap, improving knowledge transfer efficiency and model deployment flexibility.
Contribution
The paper proposes a dynamic teacher framework with bidirectional mappings and reparameterization strategies, enabling effective distillation without a pre-trained teacher and enhancing performance.
Findings
GPD outperforms existing methods with up to 1.58% accuracy gain.
GPD achieves significant improvements when training from scratch or fine-tuning.
The method is effective across CNNs and transformer architectures.
Abstract
Knowledge distillation aims to transfer knowledge from a large teacher model to a compact student counterpart, often coming with a significant performance gap between them. We find that a too-large performance gap can hamper the training process, which is also verified in recent studies. To address this, we propose a Gap Preserving Distillation (GPD) method that trains an additional dynamic teacher model from scratch along with training the student to bridge this gap. In this way, it becomes possible to maintain a reasonable performance gap between teacher and student during the whole distillation process. To further strengthen distillation from the dynamic teacher to the student, we develop a hard strategy by enforcing them to share parameters and encouraging parameter inheritance. Besides hard strategy, we also build the soft bidirectional mappings between them which are built on an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Process Optimization and Integration
