SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code
Xinghang Li, Jingzhe Ding, Chao Peng, Bing Zhao, Xiang Gao, Hongwan Gao, Xinchen Gu

TL;DR
SafeGenBench is a new benchmark framework that evaluates the security of code generated by large language models, revealing significant vulnerabilities and guiding future improvements in secure AI code generation.
Contribution
This work introduces SafeGenBench, the first comprehensive benchmark for assessing security vulnerabilities in LLM-generated code, along with an automatic evaluation framework.
Findings
State-of-the-art LLMs often produce vulnerable code
The benchmark reveals significant security deficiencies in current models
Provides insights for improving secure code generation in LLMs
Abstract
The code generation capabilities of large language models(LLMs) have emerged as a critical dimension in evaluating their overall performance. However, prior research has largely overlooked the security risks inherent in the generated code. In this work, we introduce SafeGenBench, a benchmark specifically designed to assess the security of LLM-generated code. The dataset encompasses a wide range of common software development scenarios and vulnerability types. Building upon this benchmark, we develop an automatic evaluation framework that leverages both static application security testing(SAST) and LLM-based judging to assess the presence of security vulnerabilities in model-generated code. Through the empirical evaluation of state-of-the-art LLMs on SafeGenBench, we reveal notable deficiencies in their ability to produce vulnerability-free code. Our findings highlight pressing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Information and Cyber Security · Web Application Security Vulnerabilities
