Hydra: Preserving Ensemble Diversity for Model Distillation

Linh Tran; Bastiaan S. Veeling; Kevin Roth; Jakub Swiatkowski; Joshua; V. Dillon; Jasper Snoek; Stephan Mandt; Tim Salimans; Sebastian Nowozin,; Rodolphe Jenatton

arXiv:2001.04694·cs.LG·March 22, 2021·36 cites

Hydra: Preserving Ensemble Diversity for Model Distillation

Linh Tran, Bastiaan S. Veeling, Kevin Roth, Jakub Swiatkowski, Joshua, V. Dillon, Jasper Snoek, Stephan Mandt, Tim Salimans, Sebastian Nowozin,, Rodolphe Jenatton

PDF

Open Access 1 Repo

TL;DR

Hydra is a novel distillation method that uses a multi-headed neural network to preserve ensemble diversity and uncertainty estimates in a compact model, improving over traditional averaging approaches.

Contribution

The paper introduces Hydra, a multi-headed neural network for ensemble distillation that maintains diversity and uncertainty information.

Findings

01

Hydra outperforms traditional distillation methods in classification and regression.

02

Hydra better captures uncertainty in both in-domain and out-of-distribution tasks.

03

Slight increase in parameters yields significant performance gains.

Abstract

Ensembles of models have been empirically shown to improve predictive performance and to yield robust measures of uncertainty. However, they are expensive in computation and memory. Therefore, recent research has focused on distilling ensembles into a single compact model, reducing the computational and memory burden of the ensemble while trying to preserve its predictive behavior. Most existing distillation formulations summarize the ensemble by capturing its average predictions. As a result, the diversity of the ensemble predictions, stemming from each member, is lost. Thus, the distilled model cannot provide a measure of uncertainty comparable to that of the original ensemble. To retain more faithfully the diversity of the ensemble, we propose a distillation method based on a single multi-headed neural network, which we refer to as Hydra. The shared body network learns a joint…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaung-htet-myat/Multi-teachers-Knowledge-Distillation
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis

MethodsHydra