Non-Differentiable Supervised Learning with Evolution Strategies and Hybrid Methods
Karel Lenc, Erich Elsen, Tom Schaul, Karen Simonyan

TL;DR
This paper demonstrates that Evolution Strategies can be scaled to large models and effectively combined with gradient-based methods to train models with non-differentiable parameters like sparsity masks, enabling larger and more efficient models.
Contribution
It introduces a hybrid optimization approach combining ES and gradient descent for models with both differentiable and non-differentiable parameters, scalable to large models.
Findings
ES can be scaled to models with millions of parameters.
Hybrid approach is computationally feasible and competitive.
Enables training sparse models from the first step.
Abstract
In this work we show that Evolution Strategies (ES) are a viable method for learning non-differentiable parameters of large supervised models. ES are black-box optimization algorithms that estimate distributions of model parameters; however they have only been used for relatively small problems so far. We show that it is possible to scale ES to more complex tasks and models with millions of parameters. While using ES for differentiable parameters is computationally impractical (although possible), we show that a hybrid approach is practically feasible in the case where the model has both differentiable and non-differentiable parameters. In this approach we use standard gradient-based methods for learning differentiable weights, while using ES for learning non-differentiable parameters - in our case sparsity masks of the weights. This proposed method is surprisingly competitive, and when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Evolutionary Algorithms and Applications
