On Efficient Bayesian Exploration in Model-Based Reinforcement Learning
Alberto Caron, Chris Hicks, Vasilios Mavroudis

TL;DR
This paper introduces a theoretically grounded, information-theoretic approach to data-efficient exploration in model-based reinforcement learning, utilizing epistemic uncertainty bonuses and practical approximations to improve sample efficiency.
Contribution
It provides formal guarantees for epistemic uncertainty-based exploration bonuses and proposes a new framework, PTS-BE, combining planning with information gain for improved exploration.
Findings
PTS-BE outperforms baselines in sparse reward environments
Theoretical guarantees for IG-based exploration bonuses are established
Practical approximations enable scalable implementation
Abstract
In this work, we address the challenge of data-efficient exploration in reinforcement learning by examining existing principled, information-theoretic approaches to intrinsic motivation. Specifically, we focus on a class of exploration bonuses that targets epistemic uncertainty rather than the aleatoric noise inherent in the environment. We prove that these bonuses naturally signal epistemic information gains and converge to zero once the agent becomes sufficiently certain about the environment's dynamics and rewards, thereby aligning exploration with genuine knowledge gaps. Our analysis provides formal guarantees for IG-based approaches, which previously lacked theoretical grounding. To enable practical use, we also discuss tractable approximations via sparse variational Gaussian Processes, Deep Kernels and Deep Ensemble models. We then outline a general framework - Predictive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
