Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs   with Short Burn-In Time

Xiang Ji; Gen Li

arXiv:2305.15546·cs.LG·December 13, 2023·1 cites

Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time

Xiang Ji, Gen Li

PDF

Open Access 1 Video

TL;DR

This paper introduces the first regret-optimal model-free reinforcement learning algorithm for discounted MDPs that is efficient in both sample use and burn-in time, using variance reduction and adaptive policy switching.

Contribution

It presents a novel regret-optimal, model-free RL algorithm for discounted MDPs that requires a short burn-in time and low computational resources.

Findings

01

Achieves regret optimality in discounted MDPs

02

Requires significantly less burn-in time than previous algorithms

03

Uses variance reduction and adaptive policy switching techniques

Abstract

A crucial problem in reinforcement learning is learning the optimal policy. We study this in tabular infinite-horizon discounted Markov decision processes under the online setting. The existing algorithms either fail to achieve regret optimality or have to incur a high memory and computational cost. In addition, existing optimal algorithms all require a long burn-in time in order to achieve optimal sample efficiency, i.e., their optimality is not guaranteed unless sample size surpasses a high threshold. We address both open problems by introducing a model-free algorithm that employs variance reduction and a novel technique that switches the execution policy in a slow-yet-adaptive manner. This is the first regret-optimal model-free algorithm in the discounted setting, with the additional benefit of a low burn-in time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management