Sharper Model-free Reinforcement Learning for Average-reward Markov   Decision Processes

Zihan Zhang; Qiaomin Xie

arXiv:2306.16394·cs.LG·June 29, 2023·2 cites

Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes

Zihan Zhang, Qiaomin Xie

PDF

Open Access

TL;DR

This paper introduces new provably efficient model-free reinforcement learning algorithms for infinite-horizon average-reward MDPs, achieving optimal regret bounds in online and simulator settings with novel techniques.

Contribution

The paper develops the first algorithms with optimal T-dependence for weakly communicating MDPs and introduces two new techniques for average-reward RL.

Findings

01

Achieves $ ilde{O}(S^5A^2 ext{sp}(h^*) oot{T})$ regret in online setting.

02

Sample complexity bounds close to the minimax lower bound in simulator setting.

03

Introduces value-difference estimation and confidence region construction techniques.

Abstract

We develop several provably efficient model-free reinforcement learning (RL) algorithms for infinite-horizon average-reward Markov Decision Processes (MDPs). We consider both online setting and the setting with access to a simulator. In the online setting, we propose model-free RL algorithms based on reference-advantage decomposition. Our algorithm achieves $O (S^{5} A^{2} sp (h^{*}) T)$ regret after $T$ steps, where $S \times A$ is the size of state-action space, and $sp (h^{*})$ the span of the optimal bias function. Our results are the first to achieve optimal dependence in $T$ for weakly communicating MDPs. In the simulator setting, we propose a model-free RL algorithm that finds an $ϵ$ -optimal policy using $O (\frac{S A sp ^{2} ( h ^{*} )}{ϵ ^{2}} + \frac{S ^{2} A sp ( h ^{*} )}{ϵ})$ samples, whereas the minimax lower…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAge of Information Optimization · Reinforcement Learning in Robotics