Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning

Yang Xu; Washim Uddin Mondal; Vaneet Aggarwal

arXiv:2502.16816·stat.ML·December 11, 2025

Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning

Yang Xu, Washim Uddin Mondal, Vaneet Aggarwal

PDF

Open Access

TL;DR

This paper provides the first finite-sample analysis for policy evaluation in robust average-reward MDPs, establishing sample complexity bounds and introducing a novel MLMC-based estimation method with finite expected samples.

Contribution

It introduces a finite-sample analysis framework for robust average-reward policy evaluation and develops a new MLMC-based estimator with controlled bias and finite expected samples.

Findings

01

Achieves order-optimal sample complexity of ~O(ε^{-2})

02

Introduces a truncation mechanism ensuring finite expected samples

03

Establishes the contraction property of the robust Bellman operator

Abstract

We present the first finite-sample analysis of policy evaluation in robust average-reward Markov Decision Processes (MDPs). Prior work in this setting have established only asymptotic convergence guarantees, leaving open the question of sample complexity. In this work, we address this gap by showing that the robust Bellman operator is a contraction under a carefully constructed semi-norm, and developing a stochastic approximation framework with controlled bias. Our approach builds upon Multi-Level Monte Carlo (MLMC) techniques to estimate the robust Bellman operator efficiently. To overcome the infinite expected sample complexity inherent in standard MLMC, we introduce a truncation mechanism based on a geometric distribution, ensuring a finite expected sample complexity while maintaining a small bias that decays exponentially with the truncation level. Our method achieves the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSupply Chain and Inventory Management