Retrospective: Data Mining Static Code Attributes to Learn Defect Predictors
Tim Menzies

TL;DR
This paper reflects on the impact and legacy of a highly influential 2016 study that used static code attributes for defect prediction, emphasizing the importance of data sharing in software engineering research.
Contribution
It provides a retrospective analysis of a seminal work that introduced a baseline for defect prediction using static code attributes from NASA projects.
Findings
The paper was the most cited in SE in 2016.
20% of leading TSE papers in 2018 used artifacts from the study.
It highlights the significance of data sharing for research impact.
Abstract
Industry can get any research it wants, just by publishing a baseline result along with the data and scripts need to reproduce that work. For instance, the paper ``Data Mining Static Code Attributes to Learn Defect Predictors'' presented such a baseline, using static code attributes from NASA projects. Those result were enthusiastically embraced by a software engineering research community, hungry for data. At its peak (2016) this paper was SE's most cited paper (per month). By 2018, twenty percent of leading TSE papers (according to Google Scholar Metrics), incorporated artifacts introduced and disseminated by this research. This brief note reflects on what we should remember, and what we should forget, from that paper.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Machine Learning and Data Classification · Software Engineering Research
