Evolutionary Strategy Guided Reinforcement Learning via MultiBuffer Communication
Adam Callaghan, Karl Mason, Patrick Mannion

TL;DR
This paper introduces a novel Evolutionary Reinforcement Learning framework that combines Evolutionary Strategies with TD3 using a multi-buffer system, enhancing policy search and performance on control tasks.
Contribution
It presents a new multi-buffer approach that improves policy exploration and performance in Evolutionary Reinforcement Learning by integrating Evolutionary Strategies with TD3.
Findings
Outperforms CEM-RL on 3 of 4 MuJoCo tasks
Enables freer policy search without buffer overpopulation issues
Demonstrates competitive results with state-of-the-art algorithms
Abstract
Evolutionary Algorithms and Deep Reinforcement Learning have both successfully solved control problems across a variety of domains. Recently, algorithms have been proposed which combine these two methods, aiming to leverage the strengths and mitigate the weaknesses of both approaches. In this paper we introduce a new Evolutionary Reinforcement Learning model which combines a particular family of Evolutionary algorithm called Evolutionary Strategies with the off-policy Deep Reinforcement Learning algorithm TD3. The framework utilises a multi-buffer system instead of using a single shared replay buffer. The multi-buffer system allows for the Evolutionary Strategy to search freely in the search space of policies, without running the risk of overpopulating the replay buffer with poorly performing trajectories which limit the number of desirable policy behaviour examples thus negatively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Grid Energy Management · Evolutionary Algorithms and Applications · Reinforcement Learning in Robotics
MethodsTarget Policy Smoothing · Adam · Experience Replay · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Clipped Double Q-learning · Twin Delayed Deep Deterministic
