Learning to Explore for Stochastic Gradient MCMC
SeungHyun Kim, Seohyeon Jung, Seonghyeon Kim, Juho Lee

TL;DR
This paper introduces a meta-learning approach to enhance stochastic gradient MCMC methods, enabling faster and more efficient exploration of complex, multi-modal posterior distributions in Bayesian neural networks, with demonstrated improvements on image classification tasks.
Contribution
The paper presents a novel meta-learning strategy for SGMCMC that improves exploration efficiency and transferability across tasks in high-dimensional Bayesian inference.
Findings
Significantly faster exploration of multi-modal posteriors.
Improved sampling efficiency over vanilla SGMCMC.
Effective transfer of exploration capabilities to unseen tasks.
Abstract
Bayesian Neural Networks(BNNs) with high-dimensional parameters pose a challenge for posterior inference due to the multi-modality of the posterior distributions. Stochastic Gradient MCMC(SGMCMC) with cyclical learning rate scheduling is a promising solution, but it requires a large number of sampling steps to explore high-dimensional multi-modal posteriors, making it computationally expensive. In this paper, we propose a meta-learning strategy to build \gls{sgmcmc} which can efficiently explore the multi-modal target distributions. Our algorithm allows the learned SGMCMC to quickly explore the high-density region of the posterior landscape. Also, we show that this exploration property is transferrable to various tasks, even for the ones unseen during a meta-training stage. Using popular image classification benchmarks and a variety of downstream tasks, we demonstrate that our method…
Peer Reviews
Decision·ICML 2024 Poster
- The approach is well motivated by the previous successes in meta-learning. Knowledge-sharing between different SGMCMC chains across different multi-modal distributions across similar tasks should be explored in detail. - The authors have done a good job at experimentation overall. The empirical analysis spans understanding the diversity of MCMC chains as well as looking at the downstream tasks. - I really like the idea of parameterizing the gradients of kinetic energy function - it seems i
- While the experimental results are useful, I think the paper also needs an ablation study. We need to understand the impact of the parameterized gradients v/s transferability of the posterior information across the tasks. - Understanding the compute requirement at train time is equally important. We don't know how much training time is required per step and in total and how it compares with other baselines that the authors have compared with. It'll also be useful to know the additional # of
1. The methodology of the proposed method is simple, which makes it a practical method for many tasks. 2. The experiments and ablation studies are comprehensive and cover many aspects of sampling, including predictive accuracy, uncertainty quantification, convergence diagnostic
1. The paper did not mention at all at the beginning that, there already exist meta-learning methods for SGMCMC, such as Gong et al. Only in Section 3.1, the authors first briefly mention that paper. The presentation is misleading and may give the impression that this paper is the first to study meta-learning for SGMCMC. 2. More importantly, the proposed method is essentially very similar to Gong et al, which uses the same formulation for the SGMCMC class, but will slightly different parameteriz
- The authors take a different perspective to meta-learned SGMCMC where instead of the prior work which focused on meta-learning the dynamics matrix $D$ and curl matrix $Q$, they meta-learn the kinetic energy term. - The meta-learning procedure is designed to be operationally fairly simple, and can be relatively easily accommodated into existing SGMCMC pipelines.
- I think the first and the biggest weakness of the work lies in the way it is framed. It seems that you are trying to compete with deep ensembles, and other SGMCMC variants. In fact, the message I get from the experimental results is that L2E is that it performs pretty much worse is most cases and at worse computational cost. But one could argue, the promise of the method is rather in the generalizability of the meta-learning procedure to unseen datasets. In a sense, it is pretraining for SGMCM
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ion-surface interactions and analysis · Metal and Thin Film Mechanics
