Noisy Natural Gradient as Variational Inference
Guodong Zhang, Shengyang Sun, David Duvenaud, Roger Grosse

TL;DR
This paper introduces a novel approach using noisy natural gradient methods to efficiently train Bayesian neural networks with various variational posteriors, improving uncertainty estimation and scalability.
Contribution
It reveals that natural gradient ascent with adaptive weight noise implicitly optimizes the ELBO, enabling scalable training of complex variational distributions in neural networks.
Findings
Outperforms existing methods in predictive accuracy on regression benchmarks.
Provides better uncertainty estimates matching Hamiltonian Monte Carlo.
Enhances exploration in active learning and reinforcement learning.
Abstract
Variational Bayesian neural nets combine the flexibility of deep learning with Bayesian uncertainty estimation. Unfortunately, there is a tradeoff between cheap but simple variational families (e.g.~fully factorized) or expensive and complicated inference procedures. We show that natural gradient ascent with adaptive weight noise implicitly fits a variational posterior to maximize the evidence lower bound (ELBO). This insight allows us to train full-covariance, fully factorized, or matrix-variate Gaussian variational posteriors using noisy versions of natural gradient, Adam, and K-FAC, respectively, making it possible to scale up to modern-size ConvNets. On standard regression benchmarks, our noisy K-FAC algorithm makes better predictions and matches Hamiltonian Monte Carlo's predictive variances better than existing methods. Its improved uncertainty estimates lead to more efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Gaussian Processes and Bayesian Inference · Machine Learning and Data Classification
MethodsAdam
