From Description to Score: Can LLMs Quantify Vulnerabilities?

Sima Jafarikhah; Daniel Thompson; Eva Deans; Hossein Siadati; Yi Liu

arXiv:2512.06781·cs.CR·January 6, 2026

From Description to Score: Can LLMs Quantify Vulnerabilities?

Sima Jafarikhah, Daniel Thompson, Eva Deans, Hossein Siadati, Yi Liu

PDF

Open Access

TL;DR

This paper evaluates the ability of large language models to automate vulnerability scoring, finding they outperform baselines on some metrics but face challenges due to ambiguous descriptions and limited context.

Contribution

It demonstrates the potential and limitations of LLMs like ChatGPT and Llama in automating CVE scoring, highlighting the need for richer vulnerability descriptions.

Findings

01

LLMs outperform baseline on certain CVSS metrics

02

Model performance varies across LLM families and metrics

03

Ambiguous CVE descriptions hinder accurate classification

Abstract

Manual vulnerability scoring, such as assigning Common Vulnerability Scoring System (CVSS) scores, is a resource-intensive process that is often influenced by subjective interpretation. This study investigates the potential of general-purpose large language models (LLMs), namely ChatGPT, Llama, Grok, DeepSeek, and Gemini, to automate this process by analyzing over 31{,}000 recent Common Vulnerabilities and Exposures (CVE) entries. The results show that LLMs substantially outperform the baseline on certain metrics (e.g., \textit{Availability Impact}), while offering more modest gains on others (e.g., \textit{Attack Complexity}). Moreover, model performance varies across both LLM families and individual CVSS metrics, with ChatGPT-5 attaining the highest precision. Our analysis reveals that LLMs tend to misclassify many of the same CVEs, and ensemble-based meta-classifiers only marginally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Adversarial Robustness in Machine Learning · Web Application Security Vulnerabilities