Proximal Policy Optimization with Evolutionary Mutations

Casimir Czworkowski; Stephen Hornish; Alhassan S. Yasin

arXiv:2601.14705·cs.NE·January 22, 2026

Proximal Policy Optimization with Evolutionary Mutations

Casimir Czworkowski, Stephen Hornish, Alhassan S. Yasin

PDF

Open Access

TL;DR

This paper introduces POEM, an enhancement to PPO that incorporates evolutionary-inspired adaptive mutations to improve exploration and performance in reinforcement learning tasks.

Contribution

POEM is the first to integrate adaptive evolutionary mutations into PPO, promoting exploration when policy updates stagnate, leading to improved results.

Findings

01

POEM outperforms PPO on three of four benchmark tasks.

02

Statistically significant improvements in BipedalWalker, CarRacing, MountainCar.

03

No significant difference observed in LunarLander.

Abstract

Proximal Policy Optimization (PPO) is a widely used reinforcement learning algorithm known for its stability and sample efficiency, but it often suffers from premature convergence due to limited exploration. In this paper, we propose POEM (Proximal Policy Optimization with Evolutionary Mutations), a novel modification to PPO that introduces an adaptive exploration mechanism inspired by evolutionary algorithms. POEM enhances policy diversity by monitoring the Kullback-Leibler (KL) divergence between the current policy and a moving average of previous policies. When policy changes become minimal, indicating stagnation, POEM triggers an adaptive mutation of policy parameters to promote exploration. We evaluate POEM on four OpenAI Gym environments: CarRacing, MountainCar, BipedalWalker, and LunarLander. Through extensive fine-tuning using Bayesian optimization techniques and statistical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Advanced Multi-Objective Optimization Algorithms