ELBOing Stein: Variational Bayes with Stein Mixture Inference
Ola R{\o}nning, Eric Nalisnick, Christophe Ley, Padhraic Smyth, Thomas, Hamelryck

TL;DR
This paper introduces Stein Mixture Inference (SMI), a novel variational Bayesian inference method that extends SVGD by using mixture models to better estimate uncertainty, especially in small Bayesian neural networks.
Contribution
SMI generalizes SVGD by incorporating mixture models with user-specified guides, effectively avoiding variance collapse and requiring fewer particles for accurate uncertainty estimation.
Findings
SMI outperforms SVGD in avoiding variance collapse.
SMI requires fewer particles than SVGD for small BNNs.
SMI performs well on standard datasets.
Abstract
Stein variational gradient descent (SVGD) [Liu and Wang, 2016] performs approximate Bayesian inference by representing the posterior with a set of particles. However, SVGD suffers from variance collapse, i.e. poor predictions due to underestimating uncertainty [Ba et al., 2021], even for moderately-dimensional models such as small Bayesian neural networks (BNNs). To address this issue, we generalize SVGD by letting each particle parameterize a component distribution in a mixture model. Our method, Stein Mixture Inference (SMI), optimizes a lower bound to the evidence (ELBO) and introduces user-specified guides parameterized by particles. SMI extends the Nonlinear SVGD framework [Wang and Liu, 2019] to the case of variational Bayes. SMI effectively avoids variance collapse, judging by a previously described test developed for this purpose, and performs well on standard data sets. In…
Peer Reviews
Decision·ICLR 2025 Poster
- The method is a straightforward and effective extension of SVGD/NSVGD - The paper is well-written and easy to follow and the same goes for the provided codebase
- The experiments are rather small-scale and limited to regression data sets. Their aim seems to be primarily to demonstrate the relative performance of the proposed approach compared to prior SVGD-related approaches rather than, its absolute performance. In the list of baselines, at least a comparison against an HMC performance on the UCI data sets would have been nice to see how close it can come to it (or improve upon it). - The paper lacks ablations to evaluate what happens as an underlying
1. The problem addressed in this paper is both important and compelling. Traditional approaches like ordinary mean-field variational inference (OVI) and Stein Variational Gradient Descent (SVGD) often experience variance collapse, whereas SMI provides more accurate variance estimates, improving uncertainty quantification. 2. The paper is well-written, providing a clear background and a thorough summary of related work. As someone slightly unfamiliar with the field, I particularly appreciated th
1. Variational inference offers a compelling alternative to sampling methods like MCMC due to its efficiency, especially in high-dimensional settings and with large-scale datasets. However, the current validation of SMI is limited to small to moderately-sized models, which somewhat limits its appeal and persuasiveness for broader, large-scale applications. 2. The paper lacks theoretical insights or guidance on how SMI’s performance depends on the number of particles $m$. Providing recommendati
- The application of variational inference (VI) concepts to Stein Variational Gradient Descent (SVGD) appears novel and intriguing. - The authors validate their VI-based approach through numerical experiments on several UCI benchmark datasets, demonstrating good performance. The results seem to suggest that this approach effectively mitigates the impact of variance collapse.
### Insufficient Analysis of the Motivation Behind Extending SVGD with VI for Variance Collapse Mitigation: - The main objective of this paper, as I understand it, is to mitigate variance collapse by extending the SVGD objective function through a combination of an ELBO-like objective from VI and the Non-linear SVGD framework. However, it is not entirely clear "why" this extension effectively mitigates variance collapse. While Figure 1 provides a conceptual illustration, it does not intuitively
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Bayesian Methods and Mixture Models · Music and Audio Processing
MethodsSparse Evolutionary Training
