Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo
Haque Ishfaq, Qingfeng Lan, Pan Xu, A. Rupam Mahmood, Doina Precup,, Anima Anandkumar, Kamyar Azizzadenesheli

TL;DR
This paper introduces a scalable exploration strategy for reinforcement learning that directly samples from the posterior distribution of the Q function using Langevin Monte Carlo, improving efficiency and effectiveness in deep RL tasks.
Contribution
It proposes a novel Langevin Monte Carlo-based Thompson sampling method for RL, avoiding Gaussian approximations and enabling easy deployment in deep RL with theoretical guarantees.
Findings
Achieves a regret bound of O(d^{3/2}H^{3/2} T) in linear MDPs
Demonstrates superior or comparable performance on Atari57 exploration tasks
Provides a practical and theoretically sound exploration method for deep RL
Abstract
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings. We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo, an efficient type of Markov Chain Monte Carlo (MCMC) method. Our method only needs to perform noisy gradient descent updates to learn the exact posterior distribution of the Q function, which makes our approach easy to deploy in deep RL. We provide a rigorous theoretical analysis for the proposed method and demonstrate that, in the linear Markov decision process (linear MDP) setting, it has a regret bound of , where is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research
MethodsAdam
