Global Convergence for Average Reward Constrained MDPs with Primal-Dual Actor Critic Algorithm
Yang Xu, Swetha Ganesh, Washim Uddin Mondal, Qinbo Bai, and Vaneet Aggarwal

TL;DR
This paper introduces a primal-dual actor-critic algorithm for average reward constrained MDPs that guarantees global convergence and near-optimal constraint violation rates, advancing theoretical understanding in this domain.
Contribution
It presents a novel primal-dual natural actor-critic algorithm with proven global convergence and rate guarantees for average reward CMDPs, even without knowledge of mixing time.
Findings
Achieves $ ilde{O}(1/ oot{T}{})$ convergence rate with known mixing time
Maintains near-optimal rates without mixing time knowledge under certain conditions
Establishes new theoretical benchmarks matching lower bounds for average reward CMDPs
Abstract
This paper investigates infinite-horizon average reward Constrained Markov Decision Processes (CMDPs) with general parametrization. We propose a Primal-Dual Natural Actor-Critic algorithm that adeptly manages constraints while ensuring a high convergence rate. In particular, our algorithm achieves global convergence and constraint violation rates of over a horizon of length when the mixing time, , is known to the learner. In absence of knowledge of , the achievable rates change to provided that . Our results match the theoretical lower bound for Markov Decision Processes and establish a new benchmark in the theoretical exploration of average reward CMDPs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications
