ES Is More Than Just a Traditional Finite-Difference Approximator
Joel Lehman, Jay Chen, Jeff Clune, Kenneth O. Stanley

TL;DR
This paper explores how a specific evolution strategy (ES) optimizes for robustness in neural networks, leading to different solutions than traditional gradient methods, with demonstrated benefits in reinforcement learning tasks.
Contribution
It reveals that this ES variant seeks parameters robust to perturbations, differing from gradient descent, and discusses its implications for neural network optimization.
Findings
ES produces more robust policies than gradient-based methods in reinforcement learning.
Networks optimized with ES show greater resilience to parameter perturbations.
The robustness-seeking property influences the search space and network properties.
Abstract
An evolution strategy (ES) variant based on a simplification of a natural evolution strategy recently attracted attention because it performs surprisingly well in challenging deep reinforcement learning domains. It searches for neural network parameters by generating perturbations to the current set of parameters, checking their performance, and moving in the aggregate direction of higher reward. Because it resembles a traditional finite-difference approximation of the reward gradient, it can naturally be confused with one. However, this ES optimizes for a different gradient than just reward: It optimizes for the average reward of the entire population, thereby seeking parameters that are robust to perturbation. This difference can channel ES into distinct areas of the search space relative to gradient descent, and also consequently to networks with distinct properties. This unique…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
