Bayesian Dark Knowledge
Anoop Korattikara, Vivek Rathod, Kevin Murphy, Max Welling

TL;DR
This paper introduces a method to distill Bayesian neural network posterior samples into a single, efficient model, improving upon existing approaches in accuracy, simplicity, and computational efficiency.
Contribution
It proposes a novel distillation technique for Bayesian neural networks that outperforms recent methods like expectation propagation and variational Bayes.
Findings
Better predictive accuracy than existing methods
Simpler implementation and less computational cost
More efficient at test time
Abstract
We consider the problem of Bayesian parameter estimation for deep neural networks, which is important in problem settings where we may have little data, and/ or where we need accurate posterior predictive densities, e.g., for applications involving bandits or active learning. One simple approach to this is to use online Monte Carlo methods, such as SGLD (stochastic gradient Langevin dynamics). Unfortunately, such a method needs to store many copies of the parameters (which wastes memory), and needs to make predictions using many versions of the model (which wastes time). We describe a method for "distilling" a Monte Carlo approximation to the posterior predictive density into a more compact form, namely a single deep neural network. We compare to two very recent approaches to Bayesian neural networks, namely an approach based on expectation propagation [Hernandez-Lobato and Adams,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Markov Chains and Monte Carlo Methods · Machine Learning and Algorithms
