Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs
Sergio Rozada, Dongsheng Ding, Antonio G. Marques, Alejandro Ribeiro

TL;DR
This paper introduces a novel deterministic policy gradient primal-dual method for solving constrained Markov decision processes in continuous spaces, with proven convergence and successful application to control problems.
Contribution
It develops the first deterministic policy search method for continuous-space constrained MDPs with convergence guarantees and practical effectiveness.
Findings
Converges at a sub-linear rate to an optimal regularized primal-dual pair.
Successfully applied to robot navigation and fluid control tasks.
Proves convergence with function approximation errors considered.
Abstract
We study the problem of computing deterministic optimal policies for constrained Markov decision processes (MDPs) with continuous state and action spaces, which are widely encountered in constrained dynamical systems. Designing deterministic policy gradient methods in continuous state and action spaces is particularly challenging due to the lack of enumerable state-action pairs and the adoption of deterministic policies, hindering the application of existing policy gradient methods. To this end, we develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence. Specifically, we leverage regularization of the Lagrangian of the constrained MDP to propose a deterministic policy gradient primal-dual (D-PGPD) algorithm that updates the deterministic policy via a quadratic-regularized gradient ascent step and the dual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Advanced Control Systems Optimization · Optimization and Search Problems
