Primal-Only Actor Critic Algorithm for Robust Constrained Average Cost MDPs

Anirudh Satheesh; Sooraj Sathish; Swetha Ganesh; Keenan Powell; Vaneet Aggarwal

arXiv:2511.05758·cs.LG·November 11, 2025

Primal-Only Actor Critic Algorithm for Robust Constrained Average Cost MDPs

Anirudh Satheesh, Sooraj Sathish, Swetha Ganesh, Keenan Powell, Vaneet Aggarwal

PDF

Open Access

TL;DR

This paper introduces a novel primal-only actor-critic algorithm for robust constrained average-cost MDPs, addressing challenges of non-contraction and lack of strong duality, achieving near-optimality with established sample complexities.

Contribution

It proposes the first primal-only actor-critic method for average-cost RCMDPs, overcoming key theoretical challenges and providing convergence guarantees.

Findings

01

Achieves psilon-feasibility and psilon-optimality.

02

Establishes sample complexity of tenilde;O(psilon^{-4}) and tenilde;O(psilon^{-6}) under different assumptions.

03

Demonstrates effectiveness in robust constrained average-cost settings.

Abstract

In this work, we study the problem of finding robust and safe policies in Robust Constrained Average-Cost Markov Decision Processes (RCMDPs). A key challenge in this setting is the lack of strong duality, which prevents the direct use of standard primal-dual methods for constrained RL. Additional difficulties arise from the average-cost setting, where the Robust Bellman operator is not a contraction under any norm. To address these challenges, we propose an actor-critic algorithm for Average-Cost RCMDPs. We show that our method achieves both \(\epsilon\)-feasibility and \(\epsilon\)-optimality, and we establish a sample complexities of \(\tilde{O}\left(\epsilon^{-4}\right)\) and \(\tilde{O}\left(\epsilon^{-6}\right)\) with and without slackness assumption, which is comparable to the discounted setting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adaptive Dynamic Programming Control