{\epsilon}-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning
Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee

TL;DR
This paper introduces a Bayesian ensemble method for dynamically adapting epsilon in epsilon-greedy exploration, improving exploration efficiency in model-free reinforcement learning.
Contribution
It presents a novel Bayesian perspective on epsilon as a measure of Q-value uniformity and develops a closed-form Bayesian update for adaptive epsilon tuning.
Findings
Efficiently balances exploration and exploitation across various problems.
Outperforms fixed schedules and existing adaptive schemes.
Provides monotone convergence guarantees.
Abstract
Resolving the exploration-exploitation trade-off remains a fundamental problem in the design and implementation of reinforcement learning (RL) algorithms. In this paper, we focus on model-free RL using the epsilon-greedy exploration policy, which despite its simplicity, remains one of the most frequently used forms of exploration. However, a key limitation of this policy is the specification of . In this paper, we provide a novel Bayesian perspective of as a measure of the uniformity of the Q-value function. We introduce a closed-form Bayesian model update based on Bayesian model combination (BMC), based on this new perspective, which allows us to adapt using experiences from the environment in constant time with monotone convergence guarantees. We demonstrate that our proposed algorithm, -\texttt{BMC}, efficiently balances…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Data Stream Mining Techniques
