On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent
Xingwen Zhang, Jeff Clune, Kenneth O. Stanley

TL;DR
This paper investigates the relationship between OpenAI's evolution strategies (ES) and stochastic gradient descent (SGD), revealing their correlation and demonstrating ES's competitive performance on MNIST, thereby providing new insights into their differences and applications.
Contribution
The paper introduces MNIST-based experiments to measure the correlation between ES and SGD gradients and develops an SGD proxy to predict ES performance, clarifying their relationship.
Findings
ES can achieve 99% accuracy on MNIST, surpassing previous evolutionary methods.
A correlation between ES and SGD gradients is established through experiments.
An SGD-based proxy accurately predicts ES performance across different population sizes.
Abstract
Because stochastic gradient descent (SGD) has shown promise optimizing neural networks with millions of parameters and few if any alternatives are known to exist, it has moved to the heart of leading approaches to reinforcement learning (RL). For that reason, the recent result from OpenAI showing that a particular kind of evolution strategy (ES) can rival the performance of SGD-based deep RL methods with large neural networks provoked surprise. This result is difficult to interpret in part because of the lingering ambiguity on how ES actually relates to SGD. The aim of this paper is to significantly reduce this ambiguity through a series of MNIST-based experiments designed to uncover their relationship. As a simple supervised problem without domain noise (unlike in most RL), MNIST makes it possible (1) to measure the correlation between gradients computed by ES and SGD and (2) then to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Metaheuristic Optimization Algorithms Research · Evolutionary Algorithms and Applications
MethodsStochastic Gradient Descent
