Efficient and Generalizable Tuning Strategies for Stochastic Gradient MCMC
Jeremie Coullon, Leah South, Christopher Nemeth

TL;DR
This paper introduces a bandit-based method for automatically tuning hyperparameters in stochastic gradient MCMC algorithms, improving their accuracy and practicality for Bayesian inference.
Contribution
It proposes a novel Stein discrepancy-based tuning algorithm supported by theoretical analysis and extensive experiments, addressing the lack of automated hyperparameter tuning in SGMCMC.
Findings
The method effectively tunes hyperparameters across various datasets.
It reduces the need for manual hyperparameter tuning.
Experimental results show improved posterior approximation accuracy.
Abstract
Stochastic gradient Markov chain Monte Carlo (SGMCMC) is a popular class of algorithms for scalable Bayesian inference. However, these algorithms include hyperparameters such as step size or batch size that influence the accuracy of estimators based on the obtained posterior samples. As a result, these hyperparameters must be tuned by the practitioner and currently no principled and automated way to tune them exists. Standard MCMC tuning methods based on acceptance rates cannot be used for SGMCMC, thus requiring alternative tools and diagnostics. We propose a novel bandit-based algorithm that tunes the SGMCMC hyperparameters by minimizing the Stein discrepancy between the true posterior and its Monte Carlo approximation. We provide theoretical results supporting this approach and assess various Stein-based discrepancies. We support our results with experiments on both simulated and real…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
