Policy Optimization as Wasserstein Gradient Flows
Ruiyi Zhang, Changyou Chen, Chunyuan Li, Lawrence Carin

TL;DR
This paper introduces a novel mathematical framework for policy optimization in reinforcement learning by interpreting it as Wasserstein gradient flows on the space of probability measures, providing theoretical insights and practical algorithms.
Contribution
It reformulates policy optimization as a convex distribution optimization problem using Wasserstein gradient flows, and develops efficient algorithms for this approach.
Findings
Empirical results show improved performance over existing algorithms.
The framework unifies various policy optimization methods under a common mathematical principle.
The approach is applicable to multiple RL settings.
Abstract
Policy optimization is a core component of reinforcement learning (RL), and most existing RL methods directly optimize parameters of a policy based on maximizing the expected total reward, or its surrogate. Though often achieving encouraging empirical success, its underlying mathematical principle on {\em policy-distribution} optimization is unclear. We place policy optimization into the space of probability measures, and interpret it as Wasserstein gradient flows. On the probability-measure space, under specified circumstances, policy optimization becomes a convex problem in terms of distribution optimization. To make optimization feasible, we develop efficient algorithms by numerically solving the corresponding discrete gradient flows. Our technique is applicable to several RL settings, and is related to many state-of-the-art policy-optimization algorithms. Empirical results verify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics · Advanced Neural Network Applications
