Near-Optimal Sample Complexity for MDPs via Anchoring

Jongmin Lee; Mario Bravo; Roberto Cominetti

arXiv:2502.04477·math.OC·June 16, 2025

Near-Optimal Sample Complexity for MDPs via Anchoring

Jongmin Lee, Mario Bravo, Roberto Cominetti

PDF

Open Access 1 Video

TL;DR

This paper introduces a new model-free algorithm for average reward MDPs that achieves near-optimal sample complexity without prior knowledge of certain parameters, using anchored iteration and recursive sampling.

Contribution

It presents the first model-free algorithm with near-optimal sample complexity for average reward MDPs that does not require prior knowledge of the bias span.

Findings

01

Achieves sample complexity $ ilde{O}(| ext{S}|| ext{A}| ext{sp}(h^*)^2/ ext{ε}^2)$ matching lower bounds

02

Requires no prior knowledge of the bias span and guarantees finite termination

03

Extends techniques to discounted MDPs

Abstract

We study a new model-free algorithm to compute $ε$ -optimal policies for average reward Markov decision processes, in the weakly communicating case. Given a generative model, our procedure combines a recursive sampling technique with Halpern's anchored iteration, and computes an $ε$ -optimal policy with sample and time complexity $O (∣ S ∣∣ A ∣∥ h^{*} ∥_{sp}^{2} / ε^{2})$ both in high probability and in expectation. To our knowledge, this is the best complexity among model-free algorithms, matching the known lower bound up to a factor $∥ h^{*} ∥_{sp}$ . Although the complexity bound involves the span seminorm $∥ h^{*} ∥_{sp}$ of the unknown bias vector, the algorithm requires no prior knowledge and implements a stopping rule which guarantees with probability 1 that the procedure terminates in finite time. We also analyze…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Near-Optimal Sample Complexity for MDPs via Anchoring· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Water Systems and Optimization · Sparse and Compressive Sensing Techniques