Source Code Metrics for Software Defects Prediction

Dominik Arne Rebro; Bruno Rossi; Stanislav Chren

arXiv:2301.08022·cs.SE·January 20, 2023·1 cites

Source Code Metrics for Software Defects Prediction

Dominik Arne Rebro, Bruno Rossi, Stanislav Chren

PDF

Open Access

TL;DR

This study evaluates the effectiveness of various source code metrics in predicting software defects using empirical data from Java projects, highlighting the most impactful metrics and classifiers.

Contribution

It provides an empirical assessment of source code metrics' impact on defect prediction models and compares different classifiers on a large dataset.

Findings

01

Decision Tree and Random Forest classifiers perform best.

02

NOC, NPA, DIT, and LCOM5 are highly influential metrics.

03

CBO metric does not significantly improve prediction models.

Abstract

In current research, there are contrasting results about the applicability of software source code metrics as features for defect prediction models. The goal of the paper is to evaluate the adoption of software metrics in models for software defect prediction, identifying the impact of individual source code metrics. With an empirical study on 275 release versions of 39 Java projects mined from GitHub, we compute 12 software metrics and collect software defect information. We train and compare three defect classification models. The results across all projects indicate that Decision Tree (DT) and Random Forest (RF) classifiers show the best results. Among the highest-performing individual metrics are NOC, NPA, DIT, and LCOM5. While other metrics, such as CBO, do not bring significant improvements to the models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Reliability and Analysis Research