Harnessing the Power of Reinforcement Learning for Adaptive MCMC

Congye Wang; Matthew A. Fisher; Heishiro Kanagawa; Wilson Chen; Chris. J. Oates

arXiv:2507.00671·stat.CO·July 2, 2025

Harnessing the Power of Reinforcement Learning for Adaptive MCMC

Congye Wang, Matthew A. Fisher, Heishiro Kanagawa, Wilson Chen, Chris. J. Oates

PDF

Open Access

TL;DR

This paper advances adaptive Markov Chain Monte Carlo methods by integrating reinforcement learning, introducing a novel reward function based on contrastive divergence, and demonstrating improved sampling efficiency through extensive simulations.

Contribution

It introduces a new reward signal for RL-based MCMC, specifically contrastive divergence, and develops adaptive gradient-based samplers that enhance learnability and flexibility.

Findings

01

Contrastive divergence outperforms traditional rewards in RLMH.

02

Adaptive gradient-based samplers improve sampling efficiency.

03

Simulation results validate the practical effectiveness of RLMH.

Abstract

Sampling algorithms drive probabilistic machine learning, and recent years have seen an explosion in the diversity of tools for this task. However, the increasing sophistication of sampling algorithms is correlated with an increase in the tuning burden. There is now a greater need than ever to treat the tuning of samplers as a learning task in its own right. In a conceptual breakthrough, Wang et al (2025) formulated Metropolis-Hastings as a Markov decision process, opening up the possibility for adaptive tuning using Reinforcement Learning (RL). Their emphasis was on theoretical foundations; realising the practical benefit of Reinforcement Learning Metropolis-Hastings (RLMH) was left for subsequent work. The purpose of this paper is twofold: First, we observe the surprising result that natural choices of reward, such as the acceptance rate, or the expected squared jump distance, provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference