Optimistic PAC Reinforcement Learning: the Instance-Dependent View

Andrea Tirinzoni; Aymen Al-Marjani; Emilie Kaufmann

arXiv:2207.05852·cs.LG·July 14, 2022

Optimistic PAC Reinforcement Learning: the Instance-Dependent View

Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann

PDF

Open Access

TL;DR

This paper introduces the first instance-dependent sample complexity bound for an optimistic PAC RL algorithm, BPI-UCRL, revealing near-optimality in deterministic MDPs and providing insights into the complexity differences from regret minimization.

Contribution

It provides the first instance-dependent analysis for optimistic PAC RL algorithms and introduces a new simple analysis technique called the "target trick."

Findings

01

BPI-UCRL achieves near-optimal sample complexity in deterministic MDPs.

02

The analysis introduces a refined notion of sub-optimality gap.

03

A hardness result explains the complexity gap between PAC RL and regret minimization.

Abstract

Optimistic algorithms have been extensively studied for regret minimization in episodic tabular MDPs, both from a minimax and an instance-dependent view. However, for the PAC RL problem, where the goal is to identify a near-optimal policy with high probability, little is known about their instance-dependent sample complexity. A negative result of Wagenmaker et al. (2021) suggests that optimistic sampling rules cannot be used to attain the (still elusive) optimal instance-dependent sample complexity. On the positive side, we provide the first instance-dependent bound for an optimistic algorithm for PAC RL, BPI-UCRL, for which only minimax guarantees were available (Kaufmann et al., 2021). While our bound features some minimal visitation probabilities, it also features a refined notion of sub-optimality gap compared to the value gaps that appear in prior work. Moreover, in MDPs with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics