ProSec: Fortifying Code LLMs with Proactive Security Alignment

Xiangzhe Xu; Zian Su; Jinyao Guo; Kaiyuan Zhang; Zhenting Wang; Xiangyu Zhang

arXiv:2411.12882·cs.CR·June 9, 2025

ProSec: Fortifying Code LLMs with Proactive Security Alignment

Xiangzhe Xu, Zian Su, Jinyao Guo, Kaiyuan Zhang, Zhenting Wang, Xiangyu Zhang

PDF

Open Access 1 Repo

TL;DR

ProSec introduces a proactive security alignment method for code LLMs by synthesizing vulnerability scenarios, significantly enhancing the models' security without sacrificing utility.

Contribution

ProSec systematically generates vulnerability scenarios from CWEs to improve security alignment of code LLMs, surpassing previous data limitations.

Findings

01

Models trained with ProSec are 25.2% to 35.4% more secure.

02

ProSec's synthesized scenarios trigger 25x more vulnerable code.

03

The security-focused dataset is 7x larger than previous work.

Abstract

While recent code-specific large language models (LLMs) have greatly enhanced their code generation capabilities, the safety of these models remains under-explored, posing potential risks as insecure code generated by these models may introduce vulnerabilities into real-world systems. Existing methods collect security-focused datasets from real-world vulnerabilities for instruction tuning in order to mitigate such issues. However, they are largely constrained by the data sparsity of vulnerable code, and have limited applicability in the multi-stage post-training workflows of modern LLMs. In this paper, we propose ProSec, a novel proactive security alignment approach designed to align code LLMs with secure coding practices. ProSec systematically exposes the vulnerabilities in a code LLM by synthesizing vulnerability-inducing coding scenarios from Common Weakness Enumerations (CWEs) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PurCL/ProSec
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Application Security Vulnerabilities · Digital Rights Management and Security · Digital and Cyber Forensics

MethodsALIGN