Is Q-Learning Provably Efficient? An Extended Analysis

Kushagra Rastogi; Jonathan Lee; Fabrice Harel-Canada; Aditya; Joglekar

arXiv:2009.10396·cs.LG·September 23, 2020·1 cites

Is Q-Learning Provably Efficient? An Extended Analysis

Kushagra Rastogi, Jonathan Lee, Fabrice Harel-Canada, Aditya, Joglekar

PDF

Open Access

TL;DR

This paper extends the theoretical analysis of Q-learning with UCB exploration, demonstrating it achieves sample efficiency comparable to optimal model-based methods, and provides a survey of related research.

Contribution

It offers a detailed proof analysis showing Q-learning's provable efficiency and contextualizes it within existing research on reinforcement learning guarantees.

Findings

01

Q-learning with UCB exploration matches optimal regret bounds.

02

Provides a comprehensive survey of related reinforcement learning research.

03

Highlights critical proof steps for Q-learning's efficiency.

Abstract

This work extends the analysis of the theoretical results presented within the paper Is Q-Learning Provably Efficient? by Jin et al. We include a survey of related research to contextualize the need for strengthening the theoretical guarantees related to perhaps the most important threads of model-free reinforcement learning. We also expound upon the reasoning used in the proofs to highlight the critical steps leading to the main result showing that Q-learning with UCB exploration achieves a sample efficiency that matches the optimal regret that can be achieved by any model-based approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques

MethodsQ-Learning