Proximal Policy Optimization with Evolutionary Mutations
Casimir Czworkowski, Stephen Hornish, Alhassan S. Yasin

TL;DR
This paper introduces POEM, an enhancement to PPO that incorporates evolutionary-inspired adaptive mutations to improve exploration and performance in reinforcement learning tasks.
Contribution
POEM is the first to integrate adaptive evolutionary mutations into PPO, promoting exploration when policy updates stagnate, leading to improved results.
Findings
POEM outperforms PPO on three of four benchmark tasks.
Statistically significant improvements in BipedalWalker, CarRacing, MountainCar.
No significant difference observed in LunarLander.
Abstract
Proximal Policy Optimization (PPO) is a widely used reinforcement learning algorithm known for its stability and sample efficiency, but it often suffers from premature convergence due to limited exploration. In this paper, we propose POEM (Proximal Policy Optimization with Evolutionary Mutations), a novel modification to PPO that introduces an adaptive exploration mechanism inspired by evolutionary algorithms. POEM enhances policy diversity by monitoring the Kullback-Leibler (KL) divergence between the current policy and a moving average of previous policies. When policy changes become minimal, indicating stagnation, POEM triggers an adaptive mutation of policy parameters to promote exploration. We evaluate POEM on four OpenAI Gym environments: CarRacing, MountainCar, BipedalWalker, and LunarLander. Through extensive fine-tuning using Bayesian optimization techniques and statistical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Advanced Multi-Objective Optimization Algorithms
