Bayesian filtering unifies adaptive and non-adaptive neural network   optimization methods

Laurence Aitchison

arXiv:1807.07540·stat.ML·April 17, 2020

Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods

Laurence Aitchison

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces AdaBayes, a Bayesian filtering approach to neural network optimization that unifies adaptive and non-adaptive methods, automatically transitioning between them and recovering AdamW, with competitive generalization.

Contribution

It formulates neural network optimization as Bayesian filtering, enabling a unified framework that recovers AdamW and adapts between SGD and Adam behaviors.

Findings

01

AdaBayes transitions smoothly between SGD and Adam-like behavior.

02

AdaBayes automatically recovers AdamW with decoupled weight decay.

03

AdaBayes achieves generalization performance comparable to SGD.

Abstract

We formulate the problem of neural network optimization as Bayesian filtering, where the observations are the backpropagated gradients. While neural network optimization has previously been studied using natural gradient methods which are closely related to Bayesian inference, they were unable to recover standard optimizers such as Adam and RMSprop with a root-mean-square gradient normalizer, instead getting a mean-square normalizer. To recover the root-mean-square normalizer, we find it necessary to account for the temporal dynamics of all the other parameters as they are geing optimized. The resulting optimizer, AdaBayes, adaptively transitions between SGD-like and Adam-like behaviour, automatically recovers AdamW, a state of the art variant of Adam with decoupled weight decay, and has generalisation performance competitive with SGD.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LaurenceA/adabayes
pytorchOfficial

Videos

Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Gaussian Processes and Bayesian Inference

MethodsStochastic Gradient Descent · AdamW · RMSProp · Adam