Policy-based Primal-Dual Methods for Concave CMDP with Variance   Reduction

Donghao Ying; Mengzi Amy Guo; Hyunin Lee; Yuhao Ding; Javad Lavaei,; Zuo-Jun Max Shen

arXiv:2205.10715·cs.LG·May 28, 2024·1 cites

Policy-based Primal-Dual Methods for Concave CMDP with Variance Reduction

Donghao Ying, Mengzi Amy Guo, Hyunin Lee, Yuhao Ding, Javad Lavaei,, Zuo-Jun Max Shen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a variance-reduced primal-dual policy gradient algorithm for concave CMDPs, achieving improved convergence rates and zero constraint violation, with theoretical guarantees and numerical validation.

Contribution

It develops VR-PDPG, a novel algorithm for concave CMDPs, with proven global convergence, improved rates, and constraint violation control, addressing challenges of nonconcavity and variance reduction.

Findings

01

Achieves $O(T^{-1/3})$ convergence rate in exact setting.

02

Improves to $O(T^{-1/2})$ under strong concavity.

03

Attains $ ilde{O}(\e^{-4})$ sample complexity in the stochastic setting.

Abstract

We study Concave Constrained Markov Decision Processes (Concave CMDPs) where both the objective and constraints are defined as concave functions of the state-action occupancy measure. We propose the Variance-Reduced Primal-Dual Policy Gradient Algorithm (VR-PDPG), which updates the primal variable via policy gradient ascent and the dual variable via projected sub-gradient descent. Despite the challenges posed by the loss of additivity structure and the nonconcave nature of the problem, we establish the global convergence of VR-PDPG by exploiting a form of hidden concavity. In the exact setting, we prove an $O (T^{- 1/3})$ convergence rate for both the average optimality gap and constraint violation, which further improves to $O (T^{- 1/2})$ under strong concavity of the objective in the occupancy measure. In the sample-based setting, we demonstrate that VR-PDPG achieves an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hyunin-lee/vr-pdpg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Age of Information Optimization