Provably Sample-Efficient Robust Reinforcement Learning with Average Reward

Zachary Roch; Chi Zhang; George Atia; Yue Wang

arXiv:2505.12462·cs.LG·September 26, 2025

Provably Sample-Efficient Robust Reinforcement Learning with Average Reward

Zachary Roch, Chi Zhang, George Atia, Yue Wang

PDF

Open Access

TL;DR

This paper introduces RHI, a new algorithm for robust average-reward reinforcement learning that achieves near-optimal sample complexity with weaker assumptions and no prior knowledge, advancing theoretical understanding of data-efficient robust RL.

Contribution

The paper presents RHI, a robust RL algorithm with the weakest assumptions and optimal sample complexity, filling a key gap in finite-sample analysis for robust average-reward RL.

Findings

01

RHI achieves the tightest known sample complexity bound.

02

It requires only the communicating condition, not ergodicity.

03

No prior knowledge of the MDP is needed for RHI.

Abstract

Robust reinforcement learning (RL) under the average-reward criterion is essential for long-term decision-making, particularly when the environment may differ from its specification. However, a significant gap exists in understanding the finite-sample complexity of these methods, as most existing work provides only asymptotic guarantees. This limitation hinders their principled understanding and practical deployment, especially in data-limited scenarios. We close this gap by proposing \textbf{Robust Halpern Iteration (RHI)}, a new algorithm designed for robust Markov Decision Processes (MDPs) with transition uncertainty characterized by $ℓ_{p}$ -norm and contamination models. Our approach offers three key advantages over previous methods: (1). Weaker Structural Assumptions: RHI only requires the underlying robust MDP to be communicating, a less restrictive condition than the commonly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Traffic control and management · Adaptive Dynamic Programming Control