Achieving Tractable Minimax Optimal Regret in Average Reward MDPs

Victor Boone; Zihan Zhang

arXiv:2406.01234·cs.LG·June 4, 2024

Achieving Tractable Minimax Optimal Regret in Average Reward MDPs

Victor Boone, Zihan Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces a computationally efficient algorithm for average-reward MDPs that achieves minimax optimal regret bounds without prior knowledge of the bias span, improving upon previous methods.

Contribution

The paper presents the first tractable algorithm with minimax optimal regret for average-reward MDPs, utilizing a novel subroutine PMEVI that enhances existing algorithms.

Findings

01

Achieves regret of (\u007f( ext{sp}(h^*) S A T)) with high probability.

02

Does not require prior knowledge of the bias span (h^*).

03

Introduces PMEVI, a new subroutine for bias-constrained policy computation.

Abstract

In recent years, significant attention has been directed towards learning average-reward Markov Decision Processes (MDPs). However, existing algorithms either suffer from sub-optimal regret guarantees or computational inefficiencies. In this paper, we present the first tractable algorithm with minimax optimal regret of $O (sp (h^{*}) S A T)$ , where $sp (h^{*})$ is the span of the optimal bias function $h^{*}$ , $S \times A$ is the size of the state-action space and $T$ the number of learning steps. Remarkably, our algorithm does not require prior information on $sp (h^{*})$ . Our algorithm relies on a novel subroutine, Projected Mitigated Extended Value Iteration (PMEVI), to compute bias-constrained optimal policies efficiently. This subroutine can be applied to various previous algorithms to improve regret bounds.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Achieving Tractable Minimax Optimal Regret in Average Reward MDPs· slideslive

Taxonomy

TopicsImbalanced Data Classification Techniques · Auction Theory and Applications · Consumer Market Behavior and Pricing