TL;DR
PPO-CMA enhances the standard PPO reinforcement learning algorithm by adaptively expanding exploration variance inspired by CMA-ES, leading to faster learning and reduced hyperparameter sensitivity in continuous control tasks.
Contribution
This paper introduces PPO-CMA, a novel variant of PPO that incorporates covariance matrix adaptation to improve exploration and robustness in continuous action space reinforcement learning.
Findings
Significantly improves performance on Roboschool benchmarks.
Less sensitive to hyperparameter choices compared to PPO.
Speeds up learning and avoids local optima in continuous control tasks.
Abstract
Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach. However, we observe that in a continuous action space, PPO can prematurely shrink the exploration variance, which leads to slow progress and may make the algorithm prone to getting stuck in local optima. Drawing inspiration from CMA-ES, a black-box evolutionary optimization method designed for robustness in similar situations, we propose PPO-CMA, a proximal policy optimization approach that adaptively expands the exploration variance to speed up progress. With only minor changes to PPO, our algorithm considerably improves performance in Roboschool continuous control benchmarks. Our results also show that PPO-CMA, as opposed to PPO, is significantly less sensitive to the choice of hyperparameters, allowing one to use it in complex movement optimization tasks without requiring tedious…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Entropy Regularization · Proximal Policy Optimization
