Evaluating Large Language Models for Security Bug Report Prediction

Farnaz Soltaniani; Shoaib Razzaq; Mohammad Ghafari

arXiv:2601.22921·cs.CR·February 2, 2026

Evaluating Large Language Models for Security Bug Report Prediction

Farnaz Soltaniani, Shoaib Razzaq, Mohammad Ghafari

PDF

Open Access

TL;DR

This paper evaluates prompt-based and fine-tuning methods of large language models for predicting security bug reports, highlighting their trade-offs in sensitivity, precision, and speed.

Contribution

It provides a comparative analysis of prompt-based and fine-tuned LLM approaches for security bug report prediction, revealing their respective strengths and limitations.

Findings

01

Prompted models have higher sensitivity and recall.

02

Fine-tuned models achieve higher precision and faster inference.

03

Trade-offs exist between sensitivity, precision, and inference speed.

Abstract

Early detection of security bug reports (SBRs) is critical for timely vulnerability mitigation. We present an evaluation of prompt-based engineering and fine-tuning approaches for predicting SBRs using Large Language Models (LLMs). Our findings reveal a distinct trade-off between the two approaches. Prompted proprietary models demonstrate the highest sensitivity to SBRs, achieving a G-measure of 77% and a recall of 74% on average across all the datasets, albeit at the cost of a higher false-positive rate, resulting in an average precision of only 22%. Fine-tuned models, by contrast, exhibit the opposite behavior, attaining a lower overall G-measure of 51% but substantially higher precision of 75% at the cost of reduced recall of 36%. Though a one-time investment in building fine-tuned models is necessary, the inference on the largest dataset is up to 50 times faster than that of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Information and Cyber Security · Software Engineering Research