SLANG: Fast Structured Covariance Approximations for Bayesian Deep   Learning with Natural Gradient

Aaron Mishkin; Frederik Kunstner; Didrik Nielsen; Mark Schmidt and; Mohammad Emtiyaz Khan

arXiv:1811.04504·cs.LG·January 15, 2019·27 cites

SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient

Aaron Mishkin, Frederik Kunstner, Didrik Nielsen, Mark Schmidt and, Mohammad Emtiyaz Khan

PDF

Open Access 2 Repos

TL;DR

SLANG introduces a fast, low-rank natural-gradient approximation for Bayesian deep learning, improving uncertainty estimates with less computational cost than traditional methods.

Contribution

It proposes a novel stochastic low-rank covariance approximation method that enhances uncertainty estimation in large deep models using only gradient information.

Findings

01

Faster and more accurate uncertainty estimation than mean-field methods

02

Comparable performance to state-of-the-art Bayesian inference techniques

03

Requires fewer gradient computations than existing approaches

Abstract

Uncertainty estimation in large deep-learning models is a computationally challenging task, where it is difficult to form even a Gaussian approximation to the posterior distribution. In such situations, existing methods usually resort to a diagonal approximation of the covariance matrix despite, the fact that these matrices are known to result in poor uncertainty estimates. To address this issue, we propose a new stochastic, low-rank, approximate natural-gradient (SLANG) method for variational inference in large, deep models. Our method estimates a "diagonal plus low-rank" structure based solely on back-propagated gradients of the network log-likelihood. This requires strictly less gradient computations than methods that compute the gradient of the whole variational objective. Empirical evaluations on standard benchmarks confirm that SLANG enables faster and more accurate estimation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Domain Adaptation and Few-Shot Learning · Markov Chains and Monte Carlo Methods