Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement   Learning

Christoph Dann; Tor Lattimore; Emma Brunskill

arXiv:1703.07710·cs.LG·January 3, 2018·60 cites

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Christoph Dann, Tor Lattimore, Emma Brunskill

PDF

Open Access 1 Repo

TL;DR

This paper introduces Uniform-PAC, a new framework that unifies PAC and regret bounds in episodic reinforcement learning, enabling high-probability regret guarantees and improving theoretical performance analysis.

Contribution

The paper proposes the Uniform-PAC framework, bridging PAC and regret analysis, and presents a new algorithm that achieves near-optimal regret and PAC guarantees in finite-state episodic MDPs.

Findings

01

The new algorithm is Uniform-PAC and achieves optimal regret.

02

Uniform-PAC framework provides high-probability regret guarantees.

03

The approach unifies existing PAC and regret analysis methods.

Abstract

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chrodan/FiniteEpisodicRL.jl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management