ESSAM: A Novel Competitive Evolution Strategies Approach to Reinforcement Learning for Memory Efficient LLMs Fine-Tuning

Zhishen Sun; Sizhe Dang; Guang Dai; Haishan Ye

arXiv:2602.01003·cs.LG·May 11, 2026

ESSAM: A Novel Competitive Evolution Strategies Approach to Reinforcement Learning for Memory Efficient LLMs Fine-Tuning

Zhishen Sun, Sizhe Dang, Guang Dai, Haishan Ye

PDF

1 Repo

TL;DR

ESSAM introduces a novel reinforcement learning fine-tuning method for large language models that significantly reduces GPU memory usage while maintaining or improving performance on reasoning tasks.

Contribution

It combines Evolution Strategies with Sharpness-Aware Maximization to enhance generalization and efficiency in LLM fine-tuning, outperforming traditional RL algorithms.

Findings

01

Achieves 78.27% accuracy on GSM8K reasoning task, comparable to RL methods.

02

Reduces GPU memory usage by 18x compared to PPO and 10x compared to GRPO.

03

Designs an accelerated variant that doubles speed while maintaining accuracy.

Abstract

Reinforcement learning (RL) has become a key training step for improving mathematical reasoning in large language models (LLMs), but it often has high GPU memory usage, which makes it hard to use in settings with limited resources. To reduce these issues, we propose Evolution Strategies with Sharpness-Aware Maximization (ESSAM), a full parameter fine-tuning framework that tightly combines the zero-order search in parameter space from Evolution Strategies (ES) with the Sharpness-Aware Maximization (SAM) to improve generalization. We conduct fine-tuning experiments on the mainstream mathematica reasoning task GSM8K. The results show that ESSAM achieves an average accuracy of 78.27\% across all models and its overall performance is comparable to RL methods. It surpasses classic RL algorithm PPO with an accuracy of 77.72\% and is comparable to GRPO with an accuracy of 78.34\%, and even…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

szs777/ESSAM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.