PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Perttu H\"am\"al\"ainen; Amin Babadi; Xiaoxiao Ma; Jaakko Lehtinen

arXiv:1810.02541·cs.LG·November 4, 2020

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Perttu H\"am\"al\"ainen, Amin Babadi, Xiaoxiao Ma, Jaakko Lehtinen

PDF

1 Repo

TL;DR

PPO-CMA enhances the standard PPO reinforcement learning algorithm by adaptively expanding exploration variance inspired by CMA-ES, leading to faster learning and reduced hyperparameter sensitivity in continuous control tasks.

Contribution

This paper introduces PPO-CMA, a novel variant of PPO that incorporates covariance matrix adaptation to improve exploration and robustness in continuous action space reinforcement learning.

Findings

01

Significantly improves performance on Roboschool benchmarks.

02

Less sensitive to hyperparameter choices compared to PPO.

03

Speeds up learning and avoids local optima in continuous control tasks.

Abstract

Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach. However, we observe that in a continuous action space, PPO can prematurely shrink the exploration variance, which leads to slow progress and may make the algorithm prone to getting stuck in local optima. Drawing inspiration from CMA-ES, a black-box evolutionary optimization method designed for robustness in similar situations, we propose PPO-CMA, a proximal policy optimization approach that adaptively expands the exploration variance to speed up progress. With only minor changes to PPO, our algorithm considerably improves performance in Roboschool continuous control benchmarks. Our results also show that PPO-CMA, as opposed to PPO, is significantly less sensitive to the choice of hyperparameters, allowing one to use it in complex movement optimization tasks without requiring tedious…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ppocma/ppocma
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Entropy Regularization · Proximal Policy Optimization