SecCodeBench-V2 Technical Report

Longfei Chen; Ji Zhao; Lanxiao Cui; Tong Su; Xingbo Pan; Ziyang Li; Yongxing Wu; Qijiang Cao; Qiyao Cai; Jing Zhang; Yuandong Ni; Junyao He; Zeyu Zhang; Chao Ge; Xuhuai Lu; Zeyu Gao; Yuxin Cui; Weisen Chen; Yuxuan Peng; Shengping Wang; Qi Li; Yukai Huang; Yukun Liu; Tuo Zhou; Terry Yue Zhuo; Junyang Lin; Chao Zhang

arXiv:2602.15485·cs.CR·February 19, 2026

SecCodeBench-V2 Technical Report

Longfei Chen, Ji Zhao, Lanxiao Cui, Tong Su, Xingbo Pan, Ziyang Li, Yongxing Wu, Qijiang Cao, Qiyao Cai, Jing Zhang, Yuandong Ni, Junyao He, Zeyu Zhang, Chao Ge, Xuhuai Lu, Zeyu Gao, Yuxin Cui, Weisen Chen, Yuxuan Peng, Shengping Wang, Qi Li, Yukai Huang, Yukun Liu, Tuo Zhou

PDF

Open Access

TL;DR

SecCodeBench-V2 is a comprehensive benchmark for evaluating LLM-based code copilots' ability to generate and fix secure code across multiple languages and security issues, using real industrial scenarios and rigorous testing.

Contribution

It introduces a new benchmark with 98 scenarios from industry, covering 22 CWE categories across five languages, with a unified evaluation pipeline and scoring protocol.

Findings

01

High-fidelity, expert-reviewed test cases ensure reliable ground truth.

02

Dynamic execution-based evaluation validates both correctness and security.

03

The benchmark enables holistic assessment of AI coding assistants' security capabilities.

Abstract

We introduce SecCodeBench-V2, a publicly released benchmark for evaluating Large Language Model (LLM) copilots' capabilities of generating secure code. SecCodeBench-V2 comprises 98 generation and fix scenarios derived from Alibaba Group's industrial productions, where the underlying security issues span 22 common CWE (Common Weakness Enumeration) categories across five programming languages: Java, C, Python, Go, and JavaScript. SecCodeBench-V2 adopts a function-level task formulation: each scenario provides a complete project scaffold and requires the model to implement or patch a designated target function under fixed interfaces and dependencies. For each scenario, SecCodeBench-V2 provides executable proof-of-concept (PoC) test cases for both functional validation and security verification. All test cases are authored and double-reviewed by security experts, ensuring high fidelity,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Scientific Computing and Data Management · Software Engineering Research