A New Training Framework for Deep Neural Network
Zhenyan Hou, Wenxuan Fan

TL;DR
This paper introduces Self Distillation, a new training framework that enables neural networks to learn from themselves without pre-trained teachers, reducing overheads while maintaining high performance across various tasks.
Contribution
The paper proposes a novel Self Distillation framework that eliminates the need for pre-trained teacher models in knowledge distillation, simplifying training and deployment.
Findings
Improved performance across multiple tasks and datasets
Reduces computational and storage overheads
Effective without pre-trained teacher models
Abstract
Knowledge distillation is the process of transferring the knowledge from a large model to a small model. In this process, the small model learns the generalization ability of the large model and retains the performance close to that of the large model. Knowledge distillation provides a training means to migrate the knowledge of models, facilitating model deployment and speeding up inference. However, previous distillation methods require pre-trained teacher models, which still bring computational and storage overheads. In this paper, a novel general training framework called Self Distillation (SD) is proposed. We demonstrate the effectiveness of our method by enumerating its performance improvements in diverse tasks and benchmark datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
MethodsKnowledge Distillation · Label Smoothing
