Retrospective: Data Mining Static Code Attributes to Learn Defect   Predictors

Tim Menzies

arXiv:2501.15662·cs.SE·January 28, 2025

Retrospective: Data Mining Static Code Attributes to Learn Defect Predictors

Tim Menzies

PDF

Open Access

TL;DR

This paper reflects on the impact and legacy of a highly influential 2016 study that used static code attributes for defect prediction, emphasizing the importance of data sharing in software engineering research.

Contribution

It provides a retrospective analysis of a seminal work that introduced a baseline for defect prediction using static code attributes from NASA projects.

Findings

01

The paper was the most cited in SE in 2016.

02

20% of leading TSE papers in 2018 used artifacts from the study.

03

It highlights the significance of data sharing for research impact.

Abstract

Industry can get any research it wants, just by publishing a baseline result along with the data and scripts need to reproduce that work. For instance, the paper ``Data Mining Static Code Attributes to Learn Defect Predictors'' presented such a baseline, using static code attributes from NASA projects. Those result were enthusiastically embraced by a software engineering research community, hungry for data. At its peak (2016) this paper was SE's most cited paper (per month). By 2018, twenty percent of leading TSE papers (according to Google Scholar Metrics), incorporated artifacts introduced and disseminated by this research. This brief note reflects on what we should remember, and what we should forget, from that paper.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Machine Learning and Data Classification · Software Engineering Research