TL;DR
This paper introduces a natural Langevin dynamics method for neural networks that uses Fisher matrix approximations to improve Bayesian posterior sampling, combining natural gradient descent with Langevin dynamics.
Contribution
It proposes a Fisher matrix preconditioning approach for Langevin dynamics in neural networks, leveraging approximations for large models to enhance Bayesian sampling.
Findings
Fisher matrix preconditioning improves SGLD performance.
Fisher preconditioning makes SGLD comparable to dropout as a regularizer.
Small-scale experiments on MNIST validate the approach.
Abstract
One way to avoid overfitting in machine learning is to use model parameters distributed according to a Bayesian posterior given the data, rather than the maximum likelihood estimator. Stochastic gradient Langevin dynamics (SGLD) is one algorithm to approximate such Bayesian posteriors for large models and datasets. SGLD is a standard stochastic gradient descent to which is added a controlled amount of noise, specifically scaled so that the parameter converges in law to the posterior distribution [WT11, TTV16]. The posterior predictive distribution can be approximated by an ensemble of samples from the trajectory. Choice of the variance of the noise is known to impact the practical behavior of SGLD: for instance, noise should be smaller for sensitive parameter directions. Theoretically, it has been suggested to use the inverse Fisher information matrix of the model as the variance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDropout
