Rethinking Artifact Evaluation for Software Engineering in the Age of Generative AI
Christoph Treude, Christopher M. Poskitt, Rashina Hoda

TL;DR
This paper advocates for elevating artifact evaluation as a core part of peer review in software engineering, especially in the context of generative AI reducing effort on narrative quality.
Contribution
It reframes peer review as an attention allocation problem and emphasizes the importance of artifacts over narrative quality in assessing research rigor.
Findings
Generative AI reduces effort on narrative quality, shifting reviewer attention.
Artifact evaluation can serve as a more reliable indicator of scientific rigor.
The paper proposes integrating artifact assessment more prominently into peer review.
Abstract
Peer review in software engineering research operates under tight time constraints, while generative AI has substantially reduced the human effort required to produce polished research narratives. Reviewer attention is often spent on aspects of submissions such as writing quality or literature positioning that have become relatively less effort-intensive to address, rather than on evaluating the scientific substance of a paper. At the same time, assessing whether methods are implemented correctly, analyses are sound, and claims are supported by evidence remains effort-intensive and dependent on human expertise. In software engineering research, this substance is frequently embodied in artifacts, including code, data, evidence and analysis samples, and experimental infrastructure. In this position paper, we argue that artifact evaluation should be treated as a first-class component of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
