Minimax Regret Bounds for Reinforcement Learning

Mohammad Gheshlaghi Azar; Ian Osband; R\'emi Munos

arXiv:1703.05449·stat.ML·July 4, 2017·51 cites

Minimax Regret Bounds for Reinforcement Learning

Mohammad Gheshlaghi Azar, Ian Osband, R\'emi Munos

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new reinforcement learning algorithm with provable regret bounds that improve upon previous results, matching the lower bound in certain regimes for finite horizon MDPs.

Contribution

It presents an optimistic value iteration method with tighter regret bounds, utilizing concentration inequalities and Bernstein-based exploration bonuses for better scaling.

Findings

01

Achieves a regret bound of O(√HSAT + H^2S^2A + H√T)

02

Matches the lower bound O(√HSAT) under certain conditions

03

Improves scaling in state space and horizon compared to prior algorithms

Abstract

We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs. We show that an optimistic modification to value iteration achieves a regret bound of $\tilde{O} (H S A T + H^{2} S^{2} A + H T)$ where $H$ is the time horizon, $S$ the number of states, $A$ the number of actions and $T$ the number of time-steps. This result improves over the best previous known bound $\tilde{O} (H S A T)$ achieved by the UCRL2 algorithm of Jaksch et al., 2010. The key significance of our new results is that when $T \geq H^{3} S^{3} A$ and $S A \geq H$ , it leads to a regret of $\tilde{O} (H S A T)$ that matches the established lower bound of $Ω (H S A T)$ up to a logarithmic factor. Our analysis contains two key insights. We use careful application of concentration inequalities to the optimal value function as a whole, rather than to the transitions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

seanrsinclair/AdaptiveQLearning
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms