Refactoring $\neq$ Bug-Inducing: Improving Defect Prediction with Code Change Tactics Analysis
Feifei Niu, Junqian Shao, Christoph Mayr-Dorn, Liguo Huang, Wesley K. G. Assun\c{c}\~ao, Chuanyi Li, Jidong Ge, Alexander Egyed

TL;DR
This paper highlights the importance of considering code refactoring in just-in-time defect prediction, proposing a new analysis method that improves dataset labeling and model performance by accounting for refactoring activities.
Contribution
It introduces Code chAnge Tactics (CAT) analysis to categorize refactoring, improving defect dataset labeling and enhancing the accuracy of existing JIT defect prediction models.
Findings
Refactoring consideration improves dataset labeling accuracy by 13.7%.
Ignoring refactoring reduces model F1-score by up to 37.3%.
Integrating refactoring info boosts baseline model recall and F1-score by up to 43.2% and 32.5%.
Abstract
Just-in-time defect prediction (JIT-DP) aims to predict the likelihood of code changes resulting in software defects at an early stage. Although code change metrics and semantic features have enhanced prediction accuracy, prior research has largely ignored code refactoring during both the evaluation and methodology phases, despite its prevalence. Refactoring and its propagation often tangle with bug-fixing and bug-inducing changes within the same commit and statement. Neglecting refactoring can introduce bias into the learning and evaluation of JIT-DP models. To address this gap, we investigate the impact of refactoring and its propagation on six state-of-the-art JIT-DP approaches. We propose Code chAnge Tactics (CAT) analysis to categorize code refactoring and its propagation, which improves labeling accuracy in the JIT-Defects4J dataset by 13.7%. Our experiments reveal that failing to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability
