Harnessing the Power of Reinforcement Learning for Adaptive MCMC
Congye Wang, Matthew A. Fisher, Heishiro Kanagawa, Wilson Chen, Chris. J. Oates

TL;DR
This paper advances adaptive Markov Chain Monte Carlo methods by integrating reinforcement learning, introducing a novel reward function based on contrastive divergence, and demonstrating improved sampling efficiency through extensive simulations.
Contribution
It introduces a new reward signal for RL-based MCMC, specifically contrastive divergence, and develops adaptive gradient-based samplers that enhance learnability and flexibility.
Findings
Contrastive divergence outperforms traditional rewards in RLMH.
Adaptive gradient-based samplers improve sampling efficiency.
Simulation results validate the practical effectiveness of RLMH.
Abstract
Sampling algorithms drive probabilistic machine learning, and recent years have seen an explosion in the diversity of tools for this task. However, the increasing sophistication of sampling algorithms is correlated with an increase in the tuning burden. There is now a greater need than ever to treat the tuning of samplers as a learning task in its own right. In a conceptual breakthrough, Wang et al (2025) formulated Metropolis-Hastings as a Markov decision process, opening up the possibility for adaptive tuning using Reinforcement Learning (RL). Their emphasis was on theoretical foundations; realising the practical benefit of Reinforcement Learning Metropolis-Hastings (RLMH) was left for subsequent work. The purpose of this paper is twofold: First, we observe the surprising result that natural choices of reward, such as the acceptance rate, or the expected squared jump distance, provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference
