Instance-Dependent Confidence and Early Stopping for Reinforcement   Learning

Koulik Khamaru; Eric Xia; Martin J. Wainwright; Michael I. Jordan

arXiv:2201.08536·stat.ML·January 24, 2022·1 cites

Instance-Dependent Confidence and Early Stopping for Reinforcement Learning

Koulik Khamaru, Eric Xia, Martin J. Wainwright, Michael I. Jordan

PDF

Open Access

TL;DR

This paper develops data-dependent confidence regions and an early stopping rule for reinforcement learning, enabling adaptive termination based on problem difficulty, thus improving practical efficiency of instance-optimal algorithms.

Contribution

It introduces a method to convert theoretical instance-dependent guarantees into practical guidelines with an adaptive stopping rule for RL algorithms.

Findings

01

The stopping rule adapts to problem difficulty.

02

Early termination is possible for favorable problem structures.

03

Provides sharper, data-dependent confidence regions.

Abstract

Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure. Such problem-dependent behavior is not captured by worst-case analyses and has accordingly inspired a growing effort in obtaining instance-dependent guarantees and deriving instance-optimal algorithms for RL problems. This research has been carried out, however, primarily within the confines of theory, providing guarantees that explain \textit{ex post} the performance differences observed. A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice. We address the problem of obtaining sharp instance-dependent confidence regions for the policy evaluation problem and the optimal value estimation problem of an MDP, given access to an instance-optimal algorithm. As a consequence, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications · Reinforcement Learning in Robotics · Supply Chain and Inventory Management