A Simple Baseline for Bayesian Uncertainty in Deep Learning
Wesley Maddox, Timur Garipov, Pavel Izmailov, Dmitry Vetrov, Andrew, Gordon Wilson

TL;DR
SWAG offers a simple, scalable method for uncertainty estimation in deep learning by fitting a Gaussian to SGD iterates, improving calibration and out-of-sample detection across various tasks.
Contribution
The paper introduces SWAG, a novel approach that combines SWA with a Gaussian approximation for Bayesian uncertainty in deep neural networks.
Findings
SWAG approximates the true posterior distribution of neural network weights.
SWAG outperforms popular methods like MC dropout and SGLD in calibration and out-of-sample detection.
The method is scalable and effective across diverse deep learning tasks.
Abstract
We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
MethodsStochastic Weight Averaging · Stochastic Gradient Descent
