VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization
Youpeng Li, Fuxun Yu, Xinda Wang

TL;DR
VULPO introduces an on-policy reinforcement learning framework for vulnerability detection that effectively incorporates contextual repository information, significantly outperforming existing methods and enabling more accurate, comprehensive vulnerability analysis.
Contribution
The paper presents VULPO, a novel on-policy LLM reinforcement learning approach with a new dataset and multi-dimensional reward structure for context-aware vulnerability detection.
Findings
VULPO-4B outperforms prompt engineering and off-policy baselines.
Achieves 85% F1 improvement over Qwen3-4B.
Comparable to a 150x larger model, DeepSeek-R1-0528.
Abstract
The widespread reliance on open-source software dramatically increases the risk of vulnerability exploitation, underscoring the need for effective and scalable vulnerability detection (VD). Existing VD techniques, whether traditional machine learning-based or LLM-based approaches like prompt engineering, supervised fine-tuning, or off-policy preference optimization, remain fundamentally limited in their ability to perform context-aware analysis: They depend on fixed inputs or static preference datasets, cannot adaptively explore repository-level dependencies, and are constrained by function-level benchmarks that overlook critical vulnerability context. This paper introduces Vulnerability-Adaptive Policy Optimization (VULPO), an on-policy LLM reinforcement learning framework for context-aware VD. To support training and evaluation, we first construct ContextVul, a new dataset that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation and Cyber Security · Software Engineering Research · Web Application Security Vulnerabilities
