Direct Bethe Free Energy Minimization for Bayesian Neural Network
Pavel Prochazka

TL;DR
This paper introduces a novel training method for Bayesian neural networks by directly minimizing the Bethe free energy, enabling efficient joint optimization of weights and hyperparameters without cross-validation.
Contribution
It presents a new Bethe free energy-based training framework that handles probabilistic and deterministic subgraphs, with analytical tractability and empirical Bayes hyperparameter optimization.
Findings
Bethe loss equals the exact marginal likelihood for Gaussian likelihoods.
The method is competitive with standard approaches on multiple benchmarks.
Joint empirical Bayes optimization eliminates the need for outer hyperparameter tuning.
Abstract
We propose training Bayesian neural networks by directly minimizing the Bethe free energy rather than maximizing a variational lower bound. On tree-structured factor graphs the Bethe free energy is exact; deterministic layers drop out of the objective and are trained by standard backpropagation, so the framework accommodates any mixture of probabilistic and deterministic subgraphs without modification. Restricting the weight posterior to a last-layer Gaussian yields analytically tractable losses: for a Gaussian likelihood the Bethe loss equals the exact marginal likelihood, and for a probit likelihood it reduces to a closed form via the probit-Gaussian convolution. Both objectives sit strictly between MAP and the ELBO (), removing the structural Jensen gap that no choice of variational family can close. The Z-consistent prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
