Logarithmic Regret of Exploration in Average Reward Markov Decision Processes

Victor Boone; Bruno Gaujal

arXiv:2502.06480·cs.LG·December 1, 2025

Logarithmic Regret of Exploration in Average Reward Markov Decision Processes

Victor Boone, Bruno Gaujal

PDF

Open Access

TL;DR

This paper demonstrates that replacing the Doubling Trick with the Vanishing Multiplicative rule in average reward Markov decision processes improves regret bounds and episode management, leading to better theoretical and practical performance.

Contribution

It introduces the Vanishing Multiplicative rule for episode management, showing its advantages over the traditional Doubling Trick without modifying the core EVI algorithm.

Findings

01

Regret becomes logarithmic with VM rule during bad episodes

02

VM rule improves one-shot episode performance

03

Theoretical and empirical results favor VM over DT

Abstract

In average reward Markov decision processes, state-of-the-art algorithms for regret minimization follow a well-established framework: They are model-based, optimistic and episodic. First, they maintain a confidence region from which optimistic policies are computed using a well-known subroutine called Extended Value Iteration (EVI). Second, these policies are used over time windows called episodes, each ended by the Doubling Trick (DT) rule or a variant thereof. In this work, without modifying EVI, we show that there is a significant advantage in replacing (DT) by another simple rule, that we call the Vanishing Multiplicative (VM) rule. When managing episodes with (VM), the algorithm's regret is, both in theory and in practice, as good if not better than with (DT), while the one-shot behavior is greatly improved. More specifically, the management of bad episodes (when sub-optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Processing Techniques · Reservoir Engineering and Simulation Methods · Process Optimization and Integration