Large Language Models for Software Engineering: A Reproducibility Crisis
Mohammed Latif Siddiq, Arvin Islam-Gomes, Natalie Sekerak, Joanna C. S. Santos

TL;DR
This study investigates the reproducibility practices in large language model-based software engineering research, revealing persistent gaps and proposing a maturity model to improve reproducibility standards.
Contribution
It provides the first large-scale empirical analysis of reproducibility in LLM-for-SE research and introduces a Reproducibility Maturity Model for enhanced evaluation.
Findings
Persistent gaps in artifact availability and documentation
Artifact badges do not guarantee reproducibility quality
Modest improvements in reproducibility practices over time
Abstract
Reproducibility is a cornerstone of scientific progress, yet its state in large language model (LLM)-based software engineering (SE) research remains poorly understood. This paper presents the first large-scale, empirical study of reproducibility practices in LLM-for-SE research. We systematically mined and analyzed 640 papers published between 2017 and 2025 across premier software engineering, machine learning, and natural language processing venues, extracting structured metadata from publications, repositories, and documentation. Guided by four research questions, we examine (i) the prevalence of reproducibility smells, (ii) how reproducibility has evolved over time, (iii) whether artifact evaluation badges reliably reflect reproducibility quality, and (iv) how publication venues influence transparency practices. Using a taxonomy of seven smell categories: Code and Execution, Data,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software Engineering Research · Machine Learning in Materials Science
