Span-Agnostic Optimal Sample Complexity and Oracle Inequalities for Average-Reward RL

Matthew Zurek; Yudong Chen

arXiv:2502.11238·cs.LG·June 2, 2025

Span-Agnostic Optimal Sample Complexity and Oracle Inequalities for Average-Reward RL

Matthew Zurek, Yudong Chen

PDF

Open Access

TL;DR

This paper introduces new algorithms for average-reward MDPs that achieve optimal sample complexity without prior knowledge of the span of the bias function, using innovative horizon calibration and span penalization techniques.

Contribution

The authors develop the first algorithms matching the optimal span-based complexity without prior knowledge of the span, advancing the theoretical understanding of sample efficiency in average-reward RL.

Findings

01

Algorithms achieve minimax optimal complexity without knowing $H$

02

Horizon calibration effectively tunes the effective horizon

03

Span penalization can outperform minimax complexity in certain settings

Abstract

We study the sample complexity of finding an $ε$ -optimal policy in average-reward Markov Decision Processes (MDPs) with a generative model. The minimax optimal span-based complexity of $O (S A H / ε^{2})$ , where $H$ is the span of the optimal bias function, has only been achievable with prior knowledge of the value of $H$ . Prior-knowledge-free algorithms have been the objective of intensive research, but several natural approaches provably fail to achieve this goal. We resolve this problem, developing the first algorithms matching the optimal span-based complexity without $H$ knowledge, both when the dataset size is fixed and when the suboptimality level $ε$ is fixed. Our main technique combines the discounted reduction approach with a method for automatically tuning the effective horizon based on empirical confidence intervals or lower bounds on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Advanced Statistical Process Monitoring