Assessing Reproducibility in Evolutionary Computation: A Case Study using Human- and LLM-based Assessment
Francesca Da Ros, Tarik Za\v{c}iragi\'c, Aske Plaat, Thomas B\"ack, Niki van Stein

TL;DR
This study evaluates reproducibility in evolutionary computation research over a decade, introducing a checklist and an LLM-based system to automate reproducibility assessment, revealing significant gaps and the potential of automation.
Contribution
The paper presents a structured reproducibility checklist and RECAP, an LLM-based system for automated reproducibility evaluation in evolutionary computation research.
Findings
Average reproducibility score of 0.62 among papers.
36.90% of papers provide additional materials.
RECAP achieves Cohen's k of 0.67 with human evaluators.
Abstract
Reproducibility is an important requirement in evolutionary computation, where results largely depend on computational experiments. In practice, reproducibility relies on how algorithms, experimental protocols, and artifacts are documented and shared. Despite growing awareness, there is still limited empirical evidence on the actual reproducibility levels of published work in the field. In this paper, we study the reproducibility practices in papers published in the Evolutionary Combinatorial Optimization and Metaheuristics track of the Genetic and Evolutionary Computation Conference over a ten-year period. We introduce a structured reproducibility checklist and apply it through a systematic manual assessment of the selected corpus. In addition, we propose RECAP (REproducibility Checklist Automation Pipeline), an LLM-based system that automatically evaluates reproducibility signals from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Machine Learning in Materials Science · Software Engineering Research
