SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code

Xinghang Li; Jingzhe Ding; Chao Peng; Bing Zhao; Xiang Gao; Hongwan Gao; Xinchen Gu

arXiv:2506.05692·cs.CR·June 23, 2025

SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code

Xinghang Li, Jingzhe Ding, Chao Peng, Bing Zhao, Xiang Gao, Hongwan Gao, Xinchen Gu

PDF

Open Access

TL;DR

SafeGenBench is a new benchmark framework that evaluates the security of code generated by large language models, revealing significant vulnerabilities and guiding future improvements in secure AI code generation.

Contribution

This work introduces SafeGenBench, the first comprehensive benchmark for assessing security vulnerabilities in LLM-generated code, along with an automatic evaluation framework.

Findings

01

State-of-the-art LLMs often produce vulnerable code

02

The benchmark reveals significant security deficiencies in current models

03

Provides insights for improving secure code generation in LLMs

Abstract

The code generation capabilities of large language models(LLMs) have emerged as a critical dimension in evaluating their overall performance. However, prior research has largely overlooked the security risks inherent in the generated code. In this work, we introduce SafeGenBench, a benchmark specifically designed to assess the security of LLM-generated code. The dataset encompasses a wide range of common software development scenarios and vulnerability types. Building upon this benchmark, we develop an automatic evaluation framework that leverages both static application security testing(SAST) and LLM-based judging to assess the presence of security vulnerabilities in model-generated code. Through the empirical evaluation of state-of-the-art LLMs on SafeGenBench, we reveal notable deficiencies in their ability to produce vulnerability-free code. Our findings highlight pressing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Information and Cyber Security · Web Application Security Vulnerabilities