A Call for Critically Rethinking and Reforming Data Analysis in Empirical Software Engineering
Matteo Esposito, Mikel Robredo, Murali Sridharan, Guilherme, Horta Travassos, Rafael Pe\~naloza, Valentina Lenarduzzi

TL;DR
This paper highlights widespread statistical issues in empirical software engineering research, demonstrating experts' limited ability to detect these problems, and calls for a fundamental reform in data analysis practices within the field.
Contribution
It provides a large-scale analysis of three decades of SE research, revealing persistent methodological flaws and advocating for a critical overhaul of data analysis standards.
Findings
Significant statistical issues found in primary studies
Experts showed limited ability to detect and fix statistical problems
The study advocates for reforming data analysis practices in ESE
Abstract
Context: Empirical Software Engineering (ESE) drives innovation in SE through qualitative and quantitative studies. However, concerns about the correct application of empirical methodologies have existed since the 2006 Dagstuhl seminar on SE. Objective: To analyze three decades of SE research, identify mistakes in statistical methods, and evaluate experts' ability to detect and address these issues. Methods: We conducted a literature survey of ~27,000 empirical studies, using LLMs to classify statistical methodologies as adequate or inadequate. Additionally, we selected 30 primary studies and held a workshop with 33 ESE experts to assess their ability to identify and resolve statistical issues. Results: Significant statistical issues were found in the primary studies, and experts showed limited ability to detect and correct these methodological problems, raising concerns about the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence · Scientific Computing and Data Management
