Wasserstein Proximal Policy Gradient

Zhaoyu Zhu; Shuhan Zhang; Rui Gao; Shuang Li

arXiv:2603.02576·cs.LG·March 4, 2026

Wasserstein Proximal Policy Gradient

Zhaoyu Zhu, Shuhan Zhang, Rui Gao, Shuang Li

PDF

Open Access

TL;DR

This paper introduces Wasserstein Proximal Policy Gradient (WPPG), a novel reinforcement learning method leveraging Wasserstein geometry that enables efficient policy updates without requiring explicit policy density evaluations.

Contribution

The paper develops WPPG, a new policy gradient algorithm based on Wasserstein geometry, with proven convergence and applicability to implicit stochastic policies.

Findings

01

WPPG achieves competitive performance on continuous-control benchmarks.

02

The method avoids evaluating policy log densities, simplifying implementation.

03

Global linear convergence is established for the proposed algorithm.

Abstract

We study policy gradient methods for continuous-action, entropy-regularized reinforcement learning through the lens of Wasserstein geometry. Starting from a Wasserstein proximal update, we derive Wasserstein Proximal Policy Gradient (WPPG) via an operator-splitting scheme that alternates an optimal transport update with a heat step implemented by Gaussian convolution. This formulation avoids evaluating the policy's log density or its gradient, making the method directly applicable to expressive implicit stochastic policies specified as pushforward maps. We establish a global linear convergence rate for WPPG, covering both exact policy evaluation and actor-critic implementations with controlled approximation error. Empirically, WPPG is simple to implement and attains competitive performance on standard continuous-control benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Adaptive Dynamic Programming Control