A Bayesian Sampling Approach to Exploration in Reinforcement Learning
John Asmuth, Lihong Li, Michael L. Littman, Ali Nouri, David Wingate

TL;DR
This paper introduces BOSS, a Bayesian sampling method for reinforcement learning that efficiently balances exploration and exploitation by sampling multiple models from the posterior, achieving near-optimal rewards with low sample complexity.
Contribution
The paper proposes a modular Bayesian approach with a resampling rule, extending prior work and demonstrating improved performance and flexibility in reinforcement learning tasks.
Findings
Achieves near-optimal reward with high probability
Low sample complexity relative to posterior convergence
Performs favorably compared to state-of-the-art methods
Abstract
We present a modular approach to reinforcement learning that uses a Bayesian representation of the uncertainty over models. The approach, BOSS (Best of Sampled Set), drives exploration by sampling multiple models from the posterior and selecting actions optimistically. It extends previous work by providing a rule for deciding when to resample and how to combine the models. We show that our algorithm achieves nearoptimal reward with high probability with a sample complexity that is low relative to the speed at which the posterior distribution converges during learning. We demonstrate that BOSS performs quite favorably compared to state-of-the-art reinforcement-learning approaches and illustrate its flexibility by pairing it with a non-parametric model that generalizes across states.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Data Stream Mining Techniques
