Predicting post-release defects with knowledge units (KUs) of programming languages: an empirical study
Md Ahasanuzzaman, Gustavo A. Oliva, Ahmed E. Hassan, and Zhen Ming, (Jack) Jiang

TL;DR
This empirical study introduces Knowledge Units (KUs) derived from programming languages as a novel feature set for defect prediction, demonstrating their effectiveness and complementarity to traditional metrics in predicting post-release defects in Java systems.
Contribution
The paper proposes and empirically evaluates KUs as a new data source for defect prediction, outperforming traditional metrics and enabling cost-effective models.
Findings
KUs achieve a median AUC of 0.82, outperforming traditional metrics.
Combining KUs with traditional metrics improves prediction to a median AUC of 0.89.
A 10-feature model maintains strong performance with reduced costs.
Abstract
Defect prediction plays a crucial role in software engineering, enabling developers to identify defect-prone code and improve software quality. While extensive research has focused on refining machine learning models for defect prediction, the exploration of new data sources for feature engineering remains limited. Defect prediction models primarily rely on traditional metrics such as product, process, and code ownership metrics, which, while effective, do not capture language-specific traits that may influence defect proneness. To address this gap, we introduce Knowledge Units (KUs) of programming languages as a novel feature set for analyzing software systems and defect prediction. A KU is a cohesive set of key capabilities that are offered by one or more building blocks of a given programming language. We conduct an empirical study leveraging 28 KUs that are derived from Java…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Testing and Debugging Techniques
