Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds   Revisited

Omar Darwiche Domingues; Pierre M\'enard; Emilie Kaufmann; Michal; Valko

arXiv:2010.03531·cs.LG·October 9, 2020·5 cites

Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

Omar Darwiche Domingues, Pierre M\'enard, Emilie Kaufmann, Michal, Valko

PDF

Open Access 1 Datasets

TL;DR

This paper establishes new fundamental lower bounds on sample complexity and regret for episodic non-stationary MDPs, advancing understanding of the limits of reinforcement learning in changing environments.

Contribution

It introduces novel problem-independent lower bounds for sample complexity and regret in non-stationary episodic MDPs, using new constructions of hard MDPs.

Findings

01

Lower bound of Ω((H^3SA/ε^2) log(1/δ)) on sample complexity for PAC algorithms.

02

Regret lower bound of Ω(√(H^3SAT)) for non-stationary MDPs.

03

Connections to PAC-MDP lower bounds are discussed.

Abstract

In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode. Our main contribution is a novel lower bound of $Ω ((H^{3} S A / ϵ^{2}) lo g (1/ δ))$ on the sample complexity of an $(ε, δ)$ -PAC algorithm for best policy identification in a non-stationary MDP. This lower bound relies on a construction of "hard MDPs" which is different from the ones previously used in the literature. Using this same class of MDPs, we also provide a rigorous proof of the $Ω (H^{3} S A T)$ regret bound for non-stationary MDPs. Finally, we discuss connections to PAC-MDP lower bounds.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

misovalko/my-research-papers
dataset· 21 dl
21 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms