Provably Sample-Efficient Robust Reinforcement Learning with Average Reward
Zachary Roch, Chi Zhang, George Atia, Yue Wang

TL;DR
This paper introduces RHI, a new algorithm for robust average-reward reinforcement learning that achieves near-optimal sample complexity with weaker assumptions and no prior knowledge, advancing theoretical understanding of data-efficient robust RL.
Contribution
The paper presents RHI, a robust RL algorithm with the weakest assumptions and optimal sample complexity, filling a key gap in finite-sample analysis for robust average-reward RL.
Findings
RHI achieves the tightest known sample complexity bound.
It requires only the communicating condition, not ergodicity.
No prior knowledge of the MDP is needed for RHI.
Abstract
Robust reinforcement learning (RL) under the average-reward criterion is essential for long-term decision-making, particularly when the environment may differ from its specification. However, a significant gap exists in understanding the finite-sample complexity of these methods, as most existing work provides only asymptotic guarantees. This limitation hinders their principled understanding and practical deployment, especially in data-limited scenarios. We close this gap by proposing \textbf{Robust Halpern Iteration (RHI)}, a new algorithm designed for robust Markov Decision Processes (MDPs) with transition uncertainty characterized by -norm and contamination models. Our approach offers three key advantages over previous methods: (1). Weaker Structural Assumptions: RHI only requires the underlying robust MDP to be communicating, a less restrictive condition than the commonly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Traffic control and management · Adaptive Dynamic Programming Control
