Policy-based Primal-Dual Methods for Concave CMDP with Variance Reduction
Donghao Ying, Mengzi Amy Guo, Hyunin Lee, Yuhao Ding, Javad Lavaei,, Zuo-Jun Max Shen

TL;DR
This paper introduces a variance-reduced primal-dual policy gradient algorithm for concave CMDPs, achieving improved convergence rates and zero constraint violation, with theoretical guarantees and numerical validation.
Contribution
It develops VR-PDPG, a novel algorithm for concave CMDPs, with proven global convergence, improved rates, and constraint violation control, addressing challenges of nonconcavity and variance reduction.
Findings
Achieves $O(T^{-1/3})$ convergence rate in exact setting.
Improves to $O(T^{-1/2})$ under strong concavity.
Attains $ ilde{O}(\e^{-4})$ sample complexity in the stochastic setting.
Abstract
We study Concave Constrained Markov Decision Processes (Concave CMDPs) where both the objective and constraints are defined as concave functions of the state-action occupancy measure. We propose the Variance-Reduced Primal-Dual Policy Gradient Algorithm (VR-PDPG), which updates the primal variable via policy gradient ascent and the dual variable via projected sub-gradient descent. Despite the challenges posed by the loss of additivity structure and the nonconcave nature of the problem, we establish the global convergence of VR-PDPG by exploiting a form of hidden concavity. In the exact setting, we prove an convergence rate for both the average optimality gap and constraint violation, which further improves to under strong concavity of the objective in the occupancy measure. In the sample-based setting, we demonstrate that VR-PDPG achieves an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Age of Information Optimization
