CVE-Factory: Scaling Expert-Level Agentic Tasks for Code Security Vulnerability
Xianzhen Luo, Jingyuan Zhang, Shiqi Zhou, Rain Huang, Chuan Xiao, Qingfu Zhu, Zhiyuan Ma, Xing Yue, Yang Yue, Wencong Zeng, Wanxiang Che

TL;DR
CVE-Factory is a multi-agent framework that automatically transforms CVE metadata into high-quality, executable vulnerability tasks, enabling scalable code security evaluation and benchmarking.
Contribution
It introduces CVE-Factory for automated creation of expert-level vulnerability tasks, along with a new benchmark and large-scale training environments for code security agents.
Findings
Achieves 95% solution correctness and 96% environment fidelity.
Constructs LiveCVEBench with 190 tasks across 14 languages.
Fine-tuned models significantly outperform baselines on benchmark tasks.
Abstract
Evaluating and improving the security capabilities of code agents requires high-quality, executable vulnerability tasks. However, existing works rely on costly, unscalable manual reproduction and suffer from outdated data distributions. To address these, we present CVE-Factory, the first multi-agent framework to achieve expert-level quality in automatically transforming sparse CVE metadata into fully executable agentic tasks. Cross-validation against human expert reproductions shows that CVE-Factory achieves 95\% solution correctness and 96\% environment fidelity, confirming its expert-level quality. It is also evaluated on the latest realistic vulnerabilities and achieves a 66.2\% verified success. This automation enables two downstream contributions. First, we construct LiveCVEBench, a continuously updated benchmark of 190 tasks spanning 14 languages and 153 repositories that captures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation and Cyber Security · Advanced Malware Detection Techniques · Software Engineering Research
