Prompting the Priorities: A First Look at Evaluating LLMs for Vulnerability Triage and Prioritization
Osama Al Haddad, Muhammad Ikram, Ejaz Ahmed, Young Lee

TL;DR
This study evaluates four large language models using various prompting techniques to assist security analysts in vulnerability triage, revealing moderate effectiveness but emphasizing that LLMs currently cannot replace expert judgment.
Contribution
It provides a comprehensive assessment of LLMs' capabilities in vulnerability prioritization, highlighting their strengths and limitations with specific prompting strategies.
Findings
Gemini outperformed other models on most decision points.
Prompting with exemplars improved model accuracy.
All models tended to over-predict risk.
Abstract
Security analysts face increasing pressure to triage large and complex vulnerability backlogs. Large Language Models (LLMs) offer a potential aid by automating parts of the interpretation process. We evaluate four models (ChatGPT, Claude, Gemini, and DeepSeek) across twelve prompting techniques to interpret semi-structured and unstructured vulnerability information. As a concrete use case, we test each model's ability to predict decision points in the Stakeholder-Specific Vulnerability Categorization (SSVC) framework: Exploitation, Automatable, Technical Impact, and Mission and Wellbeing. Using 384 real-world vulnerabilities from the VulZoo dataset, we issued more than 165,000 queries to assess performance under prompting styles including one-shot, few-shot, and chain-of-thought. We report F1 scores for each SSVC decision point and Cohen's kappa (weighted and unweighted) for the final…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation and Cyber Security · Artificial Intelligence in Healthcare and Education · Adversarial Robustness in Machine Learning
