On Lower Bounds for Regret in Reinforcement Learning

Ian Osband; Benjamin Van Roy

arXiv:1608.02732·stat.ML·August 10, 2016·49 cites

On Lower Bounds for Regret in Reinforcement Learning

Ian Osband, Benjamin Van Roy

PDF

Open Access

TL;DR

This paper clarifies existing lower bounds on regret in reinforcement learning, reproduces key results, questions previous conjectures, and suggests potential improvements to upper bounds.

Contribution

It reproduces a known lower bound, critiques a previous conjecture, and proposes that upper bounds can be improved to match weaker lower bounds.

Findings

01

Reproduces a regret lower bound similar to UCRL2 results

02

Questions the validity of a previous conjecture on lower bounds

03

Suggests upper bounds can be tightened to match weaker lower bounds

Abstract

This is a brief technical note to clarify the state of lower bounds on regret for reinforcement learning. In particular, this paper: - Reproduces a lower bound on regret for reinforcement learning, similar to the result of Theorem 5 in the journal UCRL2 paper (Jaksch et al 2010). - Clarifies that the proposed proof of Theorem 6 in the REGAL paper (Bartlett and Tewari 2009) does not hold using the standard techniques without further work. We suggest that this result should instead be considered a conjecture as it has no rigorous proof. - Suggests that the conjectured lower bound given by (Bartlett and Tewari 2009) is incorrect and, in fact, it is possible to improve the scaling of the upper bound to match the weaker lower bounds presented in this paper. We hope that this note serves to clarify existing results in the field of reinforcement learning and provides interesting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems