A Simple Baseline for Bayesian Uncertainty in Deep Learning

Wesley Maddox; Timur Garipov; Pavel Izmailov; Dmitry Vetrov; Andrew; Gordon Wilson

arXiv:1902.02476·cs.LG·January 1, 2020·182 cites

A Simple Baseline for Bayesian Uncertainty in Deep Learning

Wesley Maddox, Timur Garipov, Pavel Izmailov, Dmitry Vetrov, Andrew, Gordon Wilson

PDF

Open Access 5 Repos

TL;DR

SWAG offers a simple, scalable method for uncertainty estimation in deep learning by fitting a Gaussian to SGD iterates, improving calibration and out-of-sample detection across various tasks.

Contribution

The paper introduces SWAG, a novel approach that combines SWA with a Gaussian approximation for Bayesian uncertainty in deep neural networks.

Findings

01

SWAG approximates the true posterior distribution of neural network weights.

02

SWAG outperforms popular methods like MC dropout and SGLD in calibration and out-of-sample detection.

03

The method is scalable and effective across diverse deep learning tasks.

Abstract

We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning

MethodsStochastic Weight Averaging · Stochastic Gradient Descent