{\epsilon}-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy   Exploration in Model-Free Reinforcement Learning

Michael Gimelfarb; Scott Sanner; Chi-Guhn Lee

arXiv:2007.00869·cs.LG·July 3, 2020

{\epsilon}-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning

Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Bayesian ensemble method for dynamically adapting epsilon in epsilon-greedy exploration, improving exploration efficiency in model-free reinforcement learning.

Contribution

It presents a novel Bayesian perspective on epsilon as a measure of Q-value uniformity and develops a closed-form Bayesian update for adaptive epsilon tuning.

Findings

01

Efficiently balances exploration and exploitation across various problems.

02

Outperforms fixed schedules and existing adaptive schemes.

03

Provides monotone convergence guarantees.

Abstract

Resolving the exploration-exploitation trade-off remains a fundamental problem in the design and implementation of reinforcement learning (RL) algorithms. In this paper, we focus on model-free RL using the epsilon-greedy exploration policy, which despite its simplicity, remains one of the most frequently used forms of exploration. However, a key limitation of this policy is the specification of $ε$ . In this paper, we provide a novel Bayesian perspective of $ε$ as a measure of the uniformity of the Q-value function. We introduce a closed-form Bayesian model update based on Bayesian model combination (BMC), based on this new perspective, which allows us to adapt $ε$ using experiences from the environment in constant time with monotone convergence guarantees. We demonstrate that our proposed algorithm, $ε$ -\texttt{BMC}, efficiently balances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mike-gimelfarb/bayesian-epsilon-greedy
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Data Stream Mining Techniques