Riemannian Proximal Policy Optimization

Shijun Wang; Baocheng Zhu; Chen Li; Mingzhe Wu; James Zhang; Wei Chu,; Yuan Qi

arXiv:2005.09195·cs.LG·May 20, 2020

Riemannian Proximal Policy Optimization

Shijun Wang, Baocheng Zhu, Chen Li, Mingzhe Wu, James Zhang, Wei Chu,, Yuan Qi

PDF

Open Access

TL;DR

This paper introduces a Riemannian proximal policy optimization algorithm for Markov decision processes, leveraging Gaussian mixture models and Wasserstein distance bounds, with proven convergence and promising experimental results.

Contribution

It presents a novel Riemannian optimization framework for policy learning in MDPs using GMMs, with convergence guarantees and policy improvement bounds.

Findings

01

Algorithm demonstrates guaranteed convergence.

02

Experimental results show improved policy optimization.

03

Bounds on policy improvement are derived using Wasserstein distance.

Abstract

In this paper, We propose a general Riemannian proximal optimization algorithm with guaranteed convergence to solve Markov decision process (MDP) problems. To model policy functions in MDP, we employ Gaussian mixture model (GMM) and formulate it as a nonconvex optimization problem in the Riemannian space of positive semidefinite matrices. For two given policy functions, we also provide its lower bound on policy improvement by using bounds derived from the Wasserstein distance of GMMs. Preliminary experiments show the efficacy of our proposed Riemannian proximal policy optimization algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics · Human Pose and Action Recognition