VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection

Yuzhou Nie; Hongwei Li; Chengquan Guo; Ruizhe Jiang; Zhun Wang; Bo Li; Dawn Song; Wenbo Guo

arXiv:2512.07533·cs.CR·December 9, 2025

VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection

Yuzhou Nie, Hongwei Li, Chengquan Guo, Ruizhe Jiang, Zhun Wang, Bo Li, Dawn Song, Wenbo Guo

PDF

Open Access 5 Models 2 Datasets

TL;DR

VulnLLM-R is a specialized reasoning large language model designed for vulnerability detection, outperforming existing static analysis tools and models by reasoning about program states and vulnerabilities, and demonstrating real-world effectiveness.

Contribution

The paper introduces VulnLLM-R, the first specialized reasoning LLM for vulnerability detection, with a novel training recipe and agent scaffold for improved accuracy and real-world application.

Findings

01

VulnLLM-R outperforms SOTA static analysis tools and reasoning models.

02

The model detects zero-day vulnerabilities in real-world repositories.

03

A new training methodology enhances reasoning capabilities in vulnerability detection.

Abstract

We propose VulnLLM-R, the~\emph{first specialized reasoning LLM} for vulnerability detection. Our key insight is that LLMs can reason about program states and analyze the potential vulnerabilities, rather than simple pattern matching. This can improve the model's generalizability and prevent learning shortcuts. However, SOTA reasoning LLMs are typically ultra-large, closed-source, or have limited performance in vulnerability detection. To address this, we propose a novel training recipe with specialized data selection, reasoning data generation, reasoning data filtering and correction, and testing-phase optimization. Using our proposed methodology, we train a reasoning model with seven billion parameters. Through extensive experiments on SOTA datasets across Python, C/C++, and Java, we show that VulnLLM-R has superior effectiveness and efficiency than SOTA static analysis tools and both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Information and Cyber Security