Learn From the Past: Experience Ensemble Knowledge Distillation
Chaofei Wang, Shaowei Zhang, Shiji Song, Gao Huang

TL;DR
This paper introduces Experience Ensemble Knowledge Distillation (EEKD), a novel method that leverages the teacher's training experience by ensemble of intermediate models, improving student performance efficiently.
Contribution
The paper proposes a new distillation approach that incorporates teacher's training experience through ensemble of intermediate models with adaptive weighting, outperforming existing methods.
Findings
EEKD outperforms mainstream knowledge distillation methods.
EEKD surpasses standard ensemble distillation while saving training costs.
Strong ensemble teachers do not necessarily produce stronger students.
Abstract
Traditional knowledge distillation transfers "dark knowledge" of a pre-trained teacher network to a student network, and ignores the knowledge in the training process of the teacher, which we call teacher's experience. However, in realistic educational scenarios, learning experience is often more important than learning results. In this work, we propose a novel knowledge distillation method by integrating the teacher's experience for knowledge transfer, named experience ensemble knowledge distillation (EEKD). We save a moderate number of intermediate models from the training process of the teacher model uniformly, and then integrate the knowledge of these intermediate models by ensemble technique. A self-attention module is used to adaptively assign weights to different intermediate models in the process of knowledge transfer. Three principles of constructing EEKD on the quality,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Advanced Neural Network Applications · Neural Networks and Reservoir Computing
MethodsKnowledge Distillation
