Wasserstein Proximal Policy Gradient
Zhaoyu Zhu, Shuhan Zhang, Rui Gao, Shuang Li

TL;DR
This paper introduces Wasserstein Proximal Policy Gradient (WPPG), a novel reinforcement learning method leveraging Wasserstein geometry that enables efficient policy updates without requiring explicit policy density evaluations.
Contribution
The paper develops WPPG, a new policy gradient algorithm based on Wasserstein geometry, with proven convergence and applicability to implicit stochastic policies.
Findings
WPPG achieves competitive performance on continuous-control benchmarks.
The method avoids evaluating policy log densities, simplifying implementation.
Global linear convergence is established for the proposed algorithm.
Abstract
We study policy gradient methods for continuous-action, entropy-regularized reinforcement learning through the lens of Wasserstein geometry. Starting from a Wasserstein proximal update, we derive Wasserstein Proximal Policy Gradient (WPPG) via an operator-splitting scheme that alternates an optimal transport update with a heat step implemented by Gaussian convolution. This formulation avoids evaluating the policy's log density or its gradient, making the method directly applicable to expressive implicit stochastic policies specified as pushforward maps. We establish a global linear convergence rate for WPPG, covering both exact policy evaluation and actor-critic implementations with controlled approximation error. Empirically, WPPG is simple to implement and attains competitive performance on standard continuous-control benchmarks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Adaptive Dynamic Programming Control
