Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning

Yang Xu; Swetha Ganesh; Vaneet Aggarwal

arXiv:2506.07040·cs.LG·December 11, 2025

Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning

Yang Xu, Swetha Ganesh, Vaneet Aggarwal

PDF

Open Access

TL;DR

This paper develops non-asymptotic convergence guarantees for robust $Q$-learning and actor-critic algorithms in average reward MDPs under distributional uncertainties, enabling efficient robust policy learning.

Contribution

It introduces a novel contraction property of the robust $Q$ operator and provides sample-efficient algorithms for robust policy optimization under TV and Wasserstein uncertainties.

Findings

01

Optimal robust $Q$-operator is a strict contraction.

02

Algorithms achieve $ ilde{O}( ext{epsilon}^{-2})$ sample complexity.

03

Numerical simulations demonstrate effectiveness.

Abstract

We present a non-asymptotic convergence analysis of $Q$ -learning and actor-critic algorithms for robust average-reward Markov Decision Processes (MDPs) under contamination, total-variation (TV) distance, and Wasserstein uncertainty sets. A key ingredient of our analysis is showing that the optimal robust $Q$ operator is a strict contraction with respect to a carefully designed semi-norm (with constant functions quotiented out). This property enables a stochastic approximation update that learns the optimal robust $Q$ -function using $\tilde{O} (ϵ^{- 2})$ samples. We also provide an efficient routine for robust $Q$ -function estimation, which in turn facilitates robust critic estimation. Building on this, we introduce an actor-critic algorithm that learns an $ϵ$ -optimal robust policy within $\tilde{O} (ϵ^{- 2})$ samples. We provide numerical simulations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Risk and Portfolio Optimization