What Do Evolutionary Coding Agents Evolve?
Nico Pelleriti, Sree Harsha Nelaturu, Zhanke Zhou, Zongze Li, Max Zimmer, Bo Han, Sebastian Pokutta

TL;DR
This paper introduces EvoTrace and EvoReplay to analyze what evolutionary coding agents actually evolve, revealing that most score improvements come from reusing previous code and not necessarily new algorithms.
Contribution
The authors present EvoTrace, a dataset of evolutionary coding traces, and EvoReplay, a methodology to analyze search processes, distinguishing mechanisms behind score improvements.
Findings
Most score gains come from reusing previous code lines.
Approximately 30% of code lines are reintroduced after deletion.
Benchmark improvements often result from mechanisms other than new algorithmic structures.
Abstract
Recent work pairs LLMs with evolutionary search to iteratively generate, modify, and select code using task-specific feedback. These systems have produced strong results in mathematical discovery and algorithm design, yet a fundamental question remains: what do they actually evolve? Progress is typically summarized by the best score a run reaches under a task-specific evaluator, but that score can reflect several different mechanisms: new algorithmic structure, re-tuning an existing strategy, recombining ideas already in the model's internal knowledge, or overfitting to the evaluator. Distinguishing these mechanisms requires inspecting the search process itself, not only its final outcome. We introduce EvoTrace, a dataset of evolutionary coding traces spanning four evolutionary frameworks, reasoning and non-reasoning models, and 16 tasks across mathematics and algorithm design. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
