Improving Ensemble Distillation With Weight Averaging and Diversifying   Perturbation

Giung Nam; Hyungi Lee; Byeongho Heo; Juho Lee

arXiv:2206.15047·cs.LG·July 1, 2022·1 cites

Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation

Giung Nam, Hyungi Lee, Byeongho Heo, Juho Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel ensemble distillation method that uses weight averaging of subnetworks and input perturbations to effectively transfer diversity from ensemble teachers to a single student network, improving performance in image classification.

Contribution

The paper proposes a weight averaging technique for subnetworks and a perturbation strategy to enhance knowledge transfer in ensemble distillation, reducing inference cost.

Findings

01

Significant performance improvements on image classification tasks.

02

Effective transfer of ensemble diversity to a single student network.

03

No additional inference cost after training.

Abstract

Ensembles of deep neural networks have demonstrated superior performance, but their heavy computational cost hinders applying them for resource-limited environments. It motivates distilling knowledge from the ensemble teacher into a smaller student network, and there are two important design choices for this ensemble distillation: 1) how to construct the student network, and 2) what data should be shown during training. In this paper, we propose a weight averaging technique where a student with multiple subnetworks is trained to absorb the functional diversity of ensemble teachers, but then those subnetworks are properly averaged for inference, giving a single student network with no additional inference cost. We also propose a perturbation strategy that seeks inputs from which the diversities of teachers can be better transferred to the student. Combining these two, our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cs-giung/distill-latentbe
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification