FEED: Feature-level Ensemble for Knowledge Distillation

SeongUk Park; Nojun Kwak

arXiv:1909.10754·cs.CV·September 25, 2019·23 cites

FEED: Feature-level Ensemble for Knowledge Distillation

SeongUk Park, Nojun Kwak

PDF

Open Access

TL;DR

FEED introduces a versatile feature-level ensemble method for knowledge distillation that improves student network performance by transferring ensemble knowledge at the feature map level without extra test-time costs.

Contribution

The paper proposes FEED, a novel training algorithm for feature-map-based knowledge distillation using multiple teachers, with parallel and sequential variants that enhance generalization.

Findings

01

Parallel FEED improves accuracy on CIFAR-100 and ImageNet.

02

Sequential FEED provides insights into knowledge transfer dynamics.

03

No additional parameters or computations at test time.

Abstract

Knowledge Distillation (KD) aims to transfer knowledge in a teacher-student framework, by providing the predictions of the teacher network to the student network in the training stage to help the student network generalize better. It can use either a teacher with high capacity or {an} ensemble of multiple teachers. However, the latter is not convenient when one wants to use feature-map-based distillation methods. For a solution, this paper proposes a versatile and powerful training algorithm named FEature-level Ensemble for knowledge Distillation (FEED), which aims to transfer the ensemble knowledge using multiple teacher networks. We introduce a couple of training algorithms that transfer ensemble knowledge to the student at the feature map level. Among the feature-map-based distillation methods, using several non-linear transformations in parallel for transferring the knowledge of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsKnowledge Distillation