Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space   Constrained MDPs

Sergio Rozada; Dongsheng Ding; Antonio G. Marques; Alejandro Ribeiro

arXiv:2408.10015·cs.AI·April 7, 2025

Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs

Sergio Rozada, Dongsheng Ding, Antonio G. Marques, Alejandro Ribeiro

PDF

Open Access

TL;DR

This paper introduces a novel deterministic policy gradient primal-dual method for solving constrained Markov decision processes in continuous spaces, with proven convergence and successful application to control problems.

Contribution

It develops the first deterministic policy search method for continuous-space constrained MDPs with convergence guarantees and practical effectiveness.

Findings

01

Converges at a sub-linear rate to an optimal regularized primal-dual pair.

02

Successfully applied to robot navigation and fluid control tasks.

03

Proves convergence with function approximation errors considered.

Abstract

We study the problem of computing deterministic optimal policies for constrained Markov decision processes (MDPs) with continuous state and action spaces, which are widely encountered in constrained dynamical systems. Designing deterministic policy gradient methods in continuous state and action spaces is particularly challenging due to the lack of enumerable state-action pairs and the adoption of deterministic policies, hindering the application of existing policy gradient methods. To this end, we develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence. Specifically, we leverage regularization of the Lagrangian of the constrained MDP to propose a deterministic policy gradient primal-dual (D-PGPD) algorithm that updates the deterministic policy via a quadratic-regularized gradient ascent step and the dual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Advanced Control Systems Optimization · Optimization and Search Problems