Can GPT-4 Replicate Empirical Software Engineering Research?
Jenny T. Liang, Carmen Badea, Christian Bird, Robert DeLine, Denae, Ford, Nicole Forsgren, Thomas Zimmermann

TL;DR
This study evaluates GPT-4's ability to replicate empirical software engineering research, finding it can surface correct assumptions and generate high-level analysis code but struggles with detailed implementation and common knowledge application.
Contribution
The paper demonstrates GPT-4's potential to assist in replicating empirical software engineering studies and highlights its current limitations in detailed coding and domain-specific knowledge.
Findings
GPT-4 surfaces correct research assumptions
GPT-4 generates high-level analysis logic
GPT-4 produces code with implementation errors
Abstract
Empirical software engineering research on production systems has brought forth a better understanding of the software engineering process for practitioners and researchers alike. However, only a small subset of production systems is studied, limiting the impact of this research. While software engineering practitioners could benefit from replicating research on their own data, this poses its own set of challenges, since performing replications requires a deep understanding of research methodologies and subtle nuances in software engineering data. Given that large language models (LLMs), such as GPT-4, show promise in tackling both software engineering- and science-related tasks, these models could help replicate and thus democratize empirical software engineering research. In this paper, we examine GPT-4's abilities to perform replications of empirical software engineering research…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software System Performance and Reliability
MethodsMulti-Head Attention · Attention Is All You Need · Dropout · Dense Connections · Linear Layer · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection · Layer Normalization
