Students are the Best Teacher: Exit-Ensemble Distillation with Multi-Exits
Hojung Lee, Jong-Seok Lee

TL;DR
This paper introduces exit-ensemble distillation, a novel method that uses multi-exit CNN architectures to improve classification accuracy without pre-trained teachers, leveraging student-student and student-teacher collaboration.
Contribution
It presents a new knowledge distillation paradigm using multi-exit CNNs where ensemble of exits guides training, enhancing performance and convergence speed.
Findings
Significant accuracy improvements on various CNN architectures.
Faster convergence and improved training stability.
Effective without pre-trained teacher networks.
Abstract
This paper proposes a novel knowledge distillation-based learning method to improve the classification performance of convolutional neural networks (CNNs) without a pre-trained teacher network, called exit-ensemble distillation. Our method exploits the multi-exit architecture that adds auxiliary classifiers (called exits) in the middle of a conventional CNN, through which early inference results can be obtained. The idea of our method is to train the network using the ensemble of the exits as the distillation target, which greatly improves the classification performance of the overall network. Our method suggests a new paradigm of knowledge distillation; unlike the conventional notion of distillation where teachers only teach students, we show that students can also help other students and even the teacher to learn better. Experimental results demonstrate that our method achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
MethodsDropout · Average Pooling · Wide Residual Block · Grouped Convolution · Max Pooling · Residual Connection · Bottleneck Residual Block · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Convolution
