On Lower Bounds for Regret in Reinforcement Learning
Ian Osband, Benjamin Van Roy

TL;DR
This paper clarifies existing lower bounds on regret in reinforcement learning, reproduces key results, questions previous conjectures, and suggests potential improvements to upper bounds.
Contribution
It reproduces a known lower bound, critiques a previous conjecture, and proposes that upper bounds can be improved to match weaker lower bounds.
Findings
Reproduces a regret lower bound similar to UCRL2 results
Questions the validity of a previous conjecture on lower bounds
Suggests upper bounds can be tightened to match weaker lower bounds
Abstract
This is a brief technical note to clarify the state of lower bounds on regret for reinforcement learning. In particular, this paper: - Reproduces a lower bound on regret for reinforcement learning, similar to the result of Theorem 5 in the journal UCRL2 paper (Jaksch et al 2010). - Clarifies that the proposed proof of Theorem 6 in the REGAL paper (Bartlett and Tewari 2009) does not hold using the standard techniques without further work. We suggest that this result should instead be considered a conjecture as it has no rigorous proof. - Suggests that the conjectured lower bound given by (Bartlett and Tewari 2009) is incorrect and, in fact, it is possible to improve the scaling of the upper bound to match the weaker lower bounds presented in this paper. We hope that this note serves to clarify existing results in the field of reinforcement learning and provides interesting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems
