Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach

Swetha Ganesh; Vaneet Aggarwal

arXiv:2505.19986·cs.LG·October 28, 2025

Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach

Swetha Ganesh, Vaneet Aggarwal

PDF

Open Access

TL;DR

This paper introduces NAC-B, a scalable Natural Actor-Critic algorithm with batching that achieves near-optimal regret bounds for infinite-horizon average-reward unichain MDPs, even with periodicity and transient states.

Contribution

It presents NAC-B, a novel actor-critic method with batching that handles weak ergodicity assumptions and large state spaces, providing theoretical regret guarantees.

Findings

01

Achieves order-optimal $ ilde{O}( oot{T} ext{)}$ regret in unichain MDPs.

02

Handles MDPs with periodicity and transient states.

03

Formalizes benefits of batching via convergence constants.

Abstract

Actor-Critic methods are widely used for their scalability, yet existing theoretical guarantees for infinite-horizon average-reward Markov Decision Processes (MDPs) often rely on restrictive ergodicity assumptions. We propose NAC-B, a Natural Actor-Critic with Batching, that achieves order-optimal regret of $\tilde{O} (T)$ in infinite-horizon average-reward MDPs under the unichain assumption, which permits both transient states and periodicity. This assumption is among the weakest under which the classic policy gradient theorem remains valid for average-reward settings. NAC-B employs function approximation for both the actor and the critic, enabling scalability to problems with large state and action spaces. The use of batching in our algorithm helps mitigate potential periodicity in the MDP and reduces stochasticity in gradient estimates, and our analysis formalizes these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications