Method-Level Bug Severity Prediction using Source Code Metrics and LLMs
Ehsan Mashhadi, Hossein Ahmadvand, Hadi Hemmati

TL;DR
This paper explores the use of source code metrics, large language models, and their combination to predict bug severity at the method level, demonstrating significant improvements over traditional models.
Contribution
It introduces novel architectures combining source code metrics with CodeBERT, significantly enhancing bug severity prediction accuracy.
Findings
CodeBERT finetuning improves prediction results by 29%-140%.
Decision Tree and Random Forest outperform other models.
Combining metrics with CodeBERT further boosts performance.
Abstract
In the past couple of decades, significant research efforts are devoted to the prediction of software bugs. However, most existing work in this domain treats all bugs the same, which is not the case in practice. It is important for a defect prediction method to estimate the severity of the identified bugs so that the higher-severity ones get immediate attention. In this study, we investigate source code metrics, source code representation using large language models (LLMs), and their combination in predicting bug severity labels of two prominent datasets. We leverage several source metrics at method-level granularity to train eight different machine-learning models. Our results suggest that Decision Tree and Random Forest models outperform other models regarding our several evaluation metrics. We then use the pre-trained CodeBERT LLM to study the source code representations'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software System Performance and Reliability
