Problems with SZZ and Features: An empirical study of the state of practice of defect prediction data collection
Steffen Herbold, Alexander Trautsch, Fabian Trautsch, Benjamin Ledel

TL;DR
This empirical study critically examines the reliability of the SZZ algorithm for defect labeling and the impact of feature sets on defect prediction accuracy, revealing significant issues with label correctness and feature importance.
Contribution
The paper provides a comprehensive empirical analysis of SZZ defect labels and feature sets, highlighting their limitations and implications for defect prediction research.
Findings
Only half of SZZ-labeled bug fixing commits are actual bugs.
Using a six-month window causes many mislabels, with one false defect per correct one.
Small feature sets have a less significant impact on defect prediction accuracy.
Abstract
Context: The SZZ algorithm is the de facto standard for labeling bug fixing commits and finding inducing changes for defect prediction data. Recent research uncovered potential problems in different parts of the SZZ algorithm. Most defect prediction data sets provide only static code metrics as features, while research indicates that other features are also important. Objective: We provide an empirical analysis of the defect labels created with the SZZ algorithm and the impact of commonly used features on results. Method: We used a combination of manual validation and adopted or improved heuristics for the collection of defect data. We conducted an empirical study on 398 releases of 38 Apache projects. Results: We found that only half of the bug fixing commits determined by SZZ are actually bug fixing. If a six-month time frame is used in combination with SZZ to determine which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
