Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Christoph Dann, Tor Lattimore, Emma Brunskill

TL;DR
This paper introduces Uniform-PAC, a new framework that unifies PAC and regret bounds in episodic reinforcement learning, enabling high-probability regret guarantees and improving theoretical performance analysis.
Contribution
The paper proposes the Uniform-PAC framework, bridging PAC and regret analysis, and presents a new algorithm that achieves near-optimal regret and PAC guarantees in finite-state episodic MDPs.
Findings
The new algorithm is Uniform-PAC and achieves optimal regret.
Uniform-PAC framework provides high-probability regret guarantees.
The approach unifies existing PAC and regret analysis methods.
Abstract
Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
