Guarantees for Epsilon-Greedy Reinforcement Learning with Function   Approximation

Christoph Dann; Yishay Mansour; Mehryar Mohri; Ayush Sekhari; Karthik; Sridharan

arXiv:2206.09421·cs.LG·June 22, 2022·26 cites

Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation

Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik, Sridharan

PDF

Open Access

TL;DR

This paper provides the first theoretical regret and sample-complexity bounds for epsilon-greedy reinforcement learning with function approximation, introducing a new complexity measure called the myopic exploration gap.

Contribution

It offers a novel theoretical analysis of myopic exploration policies, establishing bounds and introducing the myopic exploration gap to characterize their success.

Findings

01

Sample complexity scales with 1 / alpha^2

02

Results apply to value-function-based algorithms in episodic MDPs

03

Concrete examples show the effectiveness of the myopic exploration gap

Abstract

Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to explore efficiently in some reinforcement learning tasks and yet, they perform well in many others. In fact, in practice, they are often selected as the top choices, due to their simplicity. But, for what tasks do such policies succeed? Can we give theoretical guarantees for their favorable performance? These crucial questions have been scarcely investigated, despite the prominent practical importance of these policies. This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration. Our results apply to value-function-based algorithms in episodic MDPs with bounded Bellman Eluder dimension. We propose a new complexity measure called myopic exploration gap, denoted by alpha, that captures a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference