From Illusion to Insight: Change-Aware File-Level Software Defect Prediction Using Agentic AI
Mohsen Hesamolhokama, Behnam Rohani, Amirahmad Shafiee, MohammadAmin Fazli, Jafar Habibi

TL;DR
This paper introduces a change-aware approach to file-level software defect prediction using agentic AI, addressing biases in traditional models by reasoning over code changes across versions, leading to more balanced and sensitive defect detection.
Contribution
It reformulates defect prediction as a change-aware task and proposes a multi-agent debate framework driven by large language models to improve defect detection accuracy.
Findings
Traditional models show inflated F1 scores but fail on rare defect transitions.
Change-aware reasoning improves detection sensitivity to defect introductions.
Framework achieves balanced performance across software evolution subsets.
Abstract
Much of the reported progress in file-level software defect prediction (SDP) is, in reality, nothing but an illusion of accuracy. Over the last decades, machine learning and deep learning models have reported increasing performance across software versions. However, since most files persist across releases and retain their defect labels, standard evaluation rewards label-persistence bias rather than reasoning about code changes. To address this issue, we reformulate SDP as a change-aware prediction task, in which models reason over code changes of a file within successive project versions, rather than relying on static file snapshots. Building on this formulation, we propose an LLM-driven, change-aware, multi-agent debate framework. Our experiments on multiple PROMISE projects show that traditional models achieve inflated F1, while failing on rare but critical defect-transition cases.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Advanced Software Engineering Methodologies
