BALI: Learning Neural Networks via Bayesian Layerwise Inference

Richard Kurle; Alexej Klushyn; Ralf Herbrich

arXiv:2411.12102·cs.LG·November 20, 2024

BALI: Learning Neural Networks via Bayesian Layerwise Inference

Richard Kurle, Alexej Klushyn, Ralf Herbrich

PDF

Open Access

TL;DR

BALI introduces a layerwise Bayesian inference approach for neural networks, enabling efficient exact posterior computation and improved performance on multiple tasks by updating pseudo-targets through backpropagation.

Contribution

The paper presents a novel layerwise Bayesian inference method for neural networks that allows exact posterior calculation and efficient mini-batch updates, outperforming existing methods.

Findings

01

Performs as well as or better than leading Bayesian neural network methods.

02

Converges in few iterations.

03

Effective in regression, classification, and out-of-distribution detection.

Abstract

We introduce a new method for learning Bayesian neural networks, treating them as a stack of multivariate Bayesian linear regression models. The main idea is to infer the layerwise posterior exactly if we know the target outputs of each layer. We define these pseudo-targets as the layer outputs from the forward pass, updated by the backpropagated gradients of the objective function. The resulting layerwise posterior is a matrix-normal distribution with a Kronecker-factorized covariance matrix, which can be efficiently inverted. Our method extends to the stochastic mini-batch setting using an exponential moving average over natural-parameter terms, thus gradually forgetting older data. The method converges in few iterations and performs as well as or better than leading Bayesian neural network methods on various regression, classification, and out-of-distribution detection benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsLinear Regression