Evolutionary Strategies lead to Catastrophic Forgetting in LLMs
Immanuel Abdi, Akshat Gupta, Micah Mok, Alexander Lu, Nicholas Lee, Gopala Anumanchipalli

TL;DR
This paper investigates how Evolutionary Strategies (ES), a gradient-free learning method, cause significant forgetting in large language models during continual training, highlighting a key challenge for online learning.
Contribution
The study provides a comprehensive analysis of ES's forgetting behavior in LLMs, revealing its limitations for continual learning and comparing it to gradient-based methods like GRPO.
Findings
ES achieves near-GRPO performance on math and reasoning tasks
ES exhibits significant forgetting of prior abilities during training
ES updates are less sparse and have larger $ extit{l}_2$ norms than GRPO updates
Abstract
One of the biggest missing capabilities in current AI systems is the ability to learn continuously after deployment. Implementing such continually learning systems have several challenges, one of which is the large memory requirement of gradient-based algorithms that are used to train state-of-the-art LLMs. Evolutionary Strategies (ES) have recently re-emerged as a gradient-free alternative to traditional learning algorithms and have shown encouraging performance on specific tasks in LLMs. In this paper, we perform a comprehensive analysis of ES and specifically evaluate its forgetting curves when training for an increasing number of update steps. We first find that ES is able to reach performance numbers close to GRPO for math and reasoning tasks with a comparable compute budget. However, and most importantly for continual learning, the performance gains in ES is accompanied by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Multimodal Machine Learning Applications
