Learning to Generate Secure Code via Token-Level Rewards
Jiazheng Quan, Xiaodong Li, Bin Wang, Guo An, Like Liu, Degen Huang, Lin Liu, Chengbin Hou

TL;DR
This paper introduces Vul2Safe, a framework that improves secure code generation by using token-level rewards and self-reflection, leading to fewer vulnerabilities and better code quality in large language models.
Contribution
The paper presents SRCode, a novel token-level reinforcement learning method, and PrimeVul+ dataset, advancing secure code generation with fine-grained security pattern reinforcement.
Findings
Significant reduction in security vulnerabilities in generated code.
Improved code quality across multiple benchmarks.
Enhanced model focus on critical security patterns.
Abstract
Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities. Existing approaches commonly suffer from two key limitations: the scarcity of high-quality security data and coarse-grained reinforcement learning reward signals. To address these challenges, we propose Vul2Safe, a new secure code generation framework that leverages LLM self-reflection to construct high-confidence repair pairs from real-world vulnerabilities, and further generates diverse implicit prompts to build the PrimeVul+ dataset. Meanwhile, we introduce SRCode, a novel training framework that pioneers the use of token-level rewards in reinforcement learning for code security, which enables the model to continuously attend to and reinforce critical fine-grained security patterns during training. Compared with traditional instance-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Security and Verification in Computing
