Neural Collapse Inspired Knowledge Distillation
Shuoxi Zhang, Zijian Song, Kun He

TL;DR
This paper introduces a novel knowledge distillation method inspired by Neural Collapse, which enhances student model performance by transferring the teacher's geometric feature structure, leading to improved generalization and state-of-the-art results.
Contribution
The paper proposes a new distillation paradigm that incorporates Neural Collapse structure transfer, offering a more effective way to bridge the knowledge gap between teacher and student.
Findings
NCKD improves student model accuracy.
Transferring NC structure enhances generalization.
Achieves state-of-the-art performance.
Abstract
Existing knowledge distillation (KD) methods have demonstrated their ability in achieving student network performance on par with their teachers. However, the knowledge gap between the teacher and student remains significant and may hinder the effectiveness of the distillation process. In this work, we introduce the structure of Neural Collapse (NC) into the KD framework. NC typically occurs in the final phase of training, resulting in a graceful geometric structure where the last-layer features form a simplex equiangular tight frame. Such phenomenon has improved the generalization of deep network training. We hypothesize that NC can also alleviate the knowledge gap in distillation, thereby enhancing student performance. This paper begins with an empirical analysis to bridge the connection between knowledge distillation and neural collapse. Through this analysis, we establish that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications
MethodsKnowledge Distillation
