Variational Gaussian Dropout is not Bayesian
Jiri Hron, Alexander G. de G. Matthews, Zoubin Ghahramani

TL;DR
This paper critically examines variational Gaussian dropout, revealing that its Bayesian interpretation is flawed due to improper priors and overfitting issues, and offers a non-Bayesian analysis with exact gradient computation.
Contribution
It demonstrates the ill-posed nature of Bayesian inference with log-uniform priors in Gaussian dropout and provides a new analytical form for the objective function.
Findings
Log-uniform prior does not induce a proper posterior.
Correlated weight noise can lead to infinite objectives or overfitting.
Additive reparametrisation introduces new minima.
Abstract
Gaussian multiplicative noise is commonly used as a stochastic regularisation technique in training of deterministic neural networks. A recent paper reinterpreted the technique as a specific algorithm for approximate inference in Bayesian neural networks; several extensions ensued. We show that the log-uniform prior used in all the above publications does not generally induce a proper posterior, and thus Bayesian inference in such models is ill-posed. Independent of the log-uniform prior, the correlated weight noise approximation has further issues leading to either infinite objective or high risk of overfitting. The above implies that the reported sparsity of obtained solutions cannot be explained by Bayesian or the related minimum description length arguments. We thus study the objective from a non-Bayesian perspective, provide its previously unknown analytical form which allows exact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Forecasting Techniques and Applications · Data Stream Mining Techniques
