Primal-Only Actor Critic Algorithm for Robust Constrained Average Cost MDPs
Anirudh Satheesh, Sooraj Sathish, Swetha Ganesh, Keenan Powell, Vaneet Aggarwal

TL;DR
This paper introduces a novel primal-only actor-critic algorithm for robust constrained average-cost MDPs, addressing challenges of non-contraction and lack of strong duality, achieving near-optimality with established sample complexities.
Contribution
It proposes the first primal-only actor-critic method for average-cost RCMDPs, overcoming key theoretical challenges and providing convergence guarantees.
Findings
Achieves psilon-feasibility and psilon-optimality.
Establishes sample complexity of tenilde;O(psilon^{-4}) and tenilde;O(psilon^{-6}) under different assumptions.
Demonstrates effectiveness in robust constrained average-cost settings.
Abstract
In this work, we study the problem of finding robust and safe policies in Robust Constrained Average-Cost Markov Decision Processes (RCMDPs). A key challenge in this setting is the lack of strong duality, which prevents the direct use of standard primal-dual methods for constrained RL. Additional difficulties arise from the average-cost setting, where the Robust Bellman operator is not a contraction under any norm. To address these challenges, we propose an actor-critic algorithm for Average-Cost RCMDPs. We show that our method achieves both \(\epsilon\)-feasibility and \(\epsilon\)-optimality, and we establish a sample complexities of \(\tilde{O}\left(\epsilon^{-4}\right)\) and \(\tilde{O}\left(\epsilon^{-6}\right)\) with and without slackness assumption, which is comparable to the discounted setting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adaptive Dynamic Programming Control
