TL;DR
This paper introduces Gaussian distillation, a novel method for compressing ensemble uncertainty estimates into a single distribution using a deep latent factor model, enabling efficient uncertainty preservation in smaller models.
Contribution
The paper proposes Gaussian distillation, a new distribution-based knowledge distillation method that effectively preserves uncertainty in compressed models, outperforming existing techniques.
Findings
Gaussian distillation outperforms baseline methods on benchmark datasets.
It effectively preserves uncertainty in language model fine-tuning.
Works well under distribution shift conditions.
Abstract
Deep ensembles deliver state-of-the-art, reliable uncertainty quantification, but their heavy computational and memory requirements hinder their practical deployments to real applications such as on-device AI. Knowledge distillation compresses an ensemble into small student models, but existing techniques struggle to preserve uncertainty partly because reducing the size of DNNs typically results in variation reduction. To resolve this limitation, we introduce a new method of distribution distillation (i.e. compressing a teacher ensemble into a student distribution instead of a student ensemble) called Gaussian distillation, which estimates the distribution of a teacher ensemble through a special Gaussian process called the deep latent factor model (DLF) by treating each member of the teacher ensemble as a realization of a certain stochastic process. The mean and covariance functions in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
