Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data
Xin-Cheng Wen, Yijun Yang, Cuiyun Gao, Yang Xiao, Deheng Ye

TL;DR
This paper introduces ReVD, a novel framework that enhances vulnerability detection in large language models by synthesizing reasoning data and optimizing vulnerability-specific preferences, leading to state-of-the-art results.
Contribution
ReVD is the first framework to synthesize high-quality reasoning data and apply curriculum preference optimization for improved vulnerability detection in LLMs.
Findings
ReVD achieves 12.24%-22.77% accuracy improvement on PrimeVul and SVEN datasets.
ReVD outperforms existing methods in vulnerability detection tasks.
Synthetic reasoning data enhances the model's ability to recognize vulnerability patterns.
Abstract
Large language models (LLMs) demonstrate considerable proficiency in numerous coding-related tasks; however, their capabilities in detecting software vulnerabilities remain limited. This limitation primarily stems from two factors: (1) the absence of reasoning data related to vulnerabilities, which hinders the models' ability to capture underlying vulnerability patterns; and (2) their focus on learning semantic representations rather than the reason behind them, thus failing to recognize semantically similar vulnerability samples. Furthermore, the development of LLMs specialized in vulnerability detection is challenging, particularly in environments characterized by the scarcity of high-quality datasets. In this paper, we propose a novel framework ReVD that excels at mining vulnerability patterns through reasoning data synthesizing and vulnerability-specific preference optimization.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation and Cyber Security · Software Engineering Research · Advanced Malware Detection Techniques
MethodsFocus
