FEED: Feature-level Ensemble for Knowledge Distillation
SeongUk Park, Nojun Kwak

TL;DR
FEED introduces a versatile feature-level ensemble method for knowledge distillation that improves student network performance by transferring ensemble knowledge at the feature map level without extra test-time costs.
Contribution
The paper proposes FEED, a novel training algorithm for feature-map-based knowledge distillation using multiple teachers, with parallel and sequential variants that enhance generalization.
Findings
Parallel FEED improves accuracy on CIFAR-100 and ImageNet.
Sequential FEED provides insights into knowledge transfer dynamics.
No additional parameters or computations at test time.
Abstract
Knowledge Distillation (KD) aims to transfer knowledge in a teacher-student framework, by providing the predictions of the teacher network to the student network in the training stage to help the student network generalize better. It can use either a teacher with high capacity or {an} ensemble of multiple teachers. However, the latter is not convenient when one wants to use feature-map-based distillation methods. For a solution, this paper proposes a versatile and powerful training algorithm named FEature-level Ensemble for knowledge Distillation (FEED), which aims to transfer the ensemble knowledge using multiple teacher networks. We introduce a couple of training algorithms that transfer ensemble knowledge to the student at the feature map level. Among the feature-map-based distillation methods, using several non-linear transformations in parallel for transferring the knowledge of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsKnowledge Distillation
