Functional Ensemble Distillation
Coby Penso, Idan Achituve, Ethan Fetaya

TL;DR
This paper introduces Functional Ensemble Distillation (FED), a novel method to efficiently distill ensemble predictions into a single model, improving accuracy and uncertainty estimation, especially in limited data scenarios.
Contribution
The paper proposes a new distillation approach that captures prediction covariance and enhances performance using mixup augmentation, addressing limitations of existing methods.
Findings
FED outperforms current approaches in accuracy.
FED provides better uncertainty estimation.
Mixup augmentation significantly improves distillation results.
Abstract
Bayesian models have many desirable properties, most notable is their ability to generalize from limited data and to properly estimate the uncertainty in their predictions. However, these benefits come at a steep computational cost as Bayesian inference, in most cases, is computationally intractable. One popular approach to alleviate this problem is using a Monte-Carlo estimation with an ensemble of models sampled from the posterior. However, this approach still comes at a significant computational cost, as one needs to store and run multiple models at test time. In this work, we investigate how to best distill an ensemble's predictions using an efficient model. First, we argue that current approaches that simply return distribution over predictions cannot compute important properties, such as the covariance between predictions, which can be valuable for further processing. Second, in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning and Data Classification · Anomaly Detection Techniques and Applications
MethodsMixup
