Riemannian Proximal Policy Optimization
Shijun Wang, Baocheng Zhu, Chen Li, Mingzhe Wu, James Zhang, Wei Chu,, Yuan Qi

TL;DR
This paper introduces a Riemannian proximal policy optimization algorithm for Markov decision processes, leveraging Gaussian mixture models and Wasserstein distance bounds, with proven convergence and promising experimental results.
Contribution
It presents a novel Riemannian optimization framework for policy learning in MDPs using GMMs, with convergence guarantees and policy improvement bounds.
Findings
Algorithm demonstrates guaranteed convergence.
Experimental results show improved policy optimization.
Bounds on policy improvement are derived using Wasserstein distance.
Abstract
In this paper, We propose a general Riemannian proximal optimization algorithm with guaranteed convergence to solve Markov decision process (MDP) problems. To model policy functions in MDP, we employ Gaussian mixture model (GMM) and formulate it as a nonconvex optimization problem in the Riemannian space of positive semidefinite matrices. For two given policy functions, we also provide its lower bound on policy improvement by using bounds derived from the Wasserstein distance of GMMs. Preliminary experiments show the efficacy of our proposed Riemannian proximal policy optimization algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics · Human Pose and Action Recognition
