A Sharper Global Convergence Analysis for Average Reward Reinforcement   Learning via an Actor-Critic Approach

Swetha Ganesh; Washim Uddin Mondal; Vaneet Aggarwal

arXiv:2407.18878·cs.LG·May 7, 2025

A Sharper Global Convergence Analysis for Average Reward Reinforcement Learning via an Actor-Critic Approach

Swetha Ganesh, Washim Uddin Mondal, Vaneet Aggarwal

PDF

Open Access

TL;DR

This paper introduces a novel actor-critic algorithm for average-reward reinforcement learning that achieves a near-optimal convergence rate without prior knowledge of mixing times, scalable to infinite state spaces.

Contribution

It presents the first global convergence rate of O(1/a0T) for average-reward MDPs that does not depend on mixing times or state space size.

Findings

01

Achieves O(1/a0T) convergence rate.

02

Applicable to infinite state spaces.

03

Does not require knowledge of mixing or hitting times.

Abstract

This work examines average-reward reinforcement learning with general policy parametrization. Existing state-of-the-art (SOTA) guarantees for this problem are either suboptimal or hindered by several challenges, including poor scalability with respect to the size of the state-action space, high iteration complexity, and dependence on knowledge of mixing times and hitting times. To address these limitations, we propose a Multi-level Monte Carlo-based Natural Actor-Critic (MLMC-NAC) algorithm. Our work is the first to achieve a global convergence rate of $\tilde{O} (1/ T)$ for average-reward Markov Decision Processes (MDPs) (where $T$ is the horizon length), without requiring the knowledge of mixing and hitting times. Moreover, the convergence rate does not scale with the size of the state space, therefore even being applicable to infinite state spaces.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic control and management