L2M: Practical posterior Laplace approximation with optimization-driven second moment estimation
Christian S. Perone, Roberto Pereira Silveira, Thomas Paula

TL;DR
L2M introduces a practical method for posterior Laplace approximation in neural networks by leveraging the gradient second moment, estimated during standard optimization, to enable uncertainty quantification without additional computational cost.
Contribution
The paper proposes a simple, efficient approach to posterior Laplace approximation using gradient second moments from common optimizers, eliminating the need for curvature matrix computation.
Findings
Method is easy to implement with minimal code changes.
No extra computational steps or hyperparameters are needed.
Provides reasonable uncertainty estimates in neural networks.
Abstract
Uncertainty quantification for deep neural networks has recently evolved through many techniques. In this work, we revisit Laplace approximation, a classical approach for posterior approximation that is computationally attractive. However, instead of computing the curvature matrix, we show that, under some regularity conditions, the Laplace approximation can be easily constructed using the gradient second moment. This quantity is already estimated by many exponential moving average variants of Adagrad such as Adam and RMSprop, but is traditionally discarded after training. We show that our method (L2M) does not require changes in models or optimization, can be implemented in a few lines of code to yield reasonable results, and it does not require any extra computational steps besides what is already being computed by optimizers, without introducing any new hyperparameter. We hope our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Gaussian Processes and Bayesian Inference · Sparse and Compressive Sensing Techniques
MethodsAdam · AdaGrad
