Convex Regularization and Convergence of Policy Gradient Flows under Safety Constraints

Pekka Malo; Lauri Viitasaari; Antti Suominen; Eeva Vilkkumaa; Olli Tahvonen

arXiv:2411.19193·cs.LG·September 17, 2025

Convex Regularization and Convergence of Policy Gradient Flows under Safety Constraints

Pekka Malo, Lauri Viitasaari, Antti Suominen, Eeva Vilkkumaa, Olli Tahvonen

PDF

Open Access

TL;DR

This paper introduces a convex regularization framework for safe reinforcement learning using policy gradient flows, providing convergence guarantees and practical methods for high-dimensional safety-critical applications.

Contribution

It develops a mean-field, Wasserstein gradient flow approach for safety-constrained RL with theoretical solvability and convergence results, extending regularization techniques to complex settings.

Findings

01

Exponential convergence under sufficient regularization

02

Solvability conditions for safety-constrained problems

03

Support for practical particle method implementations

Abstract

This paper examines reinforcement learning (RL) in infinite-horizon decision processes with almost-sure safety constraints, crucial for applications like autonomous systems, finance, and resource management. We propose a doubly-regularized RL framework combining reward and parameter regularization to address safety constraints in continuous state-action spaces. The problem is formulated as a convex regularized objective with parametrized policies in the mean-field regime. Leveraging mean-field theory and Wasserstein gradient flows, policies are modeled on an infinite-dimensional statistical manifold, with updates governed by parameter distribution gradient flows. Key contributions include solvability conditions for safety-constrained problems, smooth bounded approximations for gradient flows, and exponential convergence guarantees under sufficient regularization. General regularization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Stochastic Gradient Optimization Techniques · Risk and Portfolio Optimization

MethodsEntropy Regularization