Is Your AI-Generated Code Really Safe? Evaluating Large Language Models   on Secure Code Generation with CodeSecEval

Jiexin Wang; Xitong Luo; Liuwen Cao; Hongkui He; Hailin Huang; Jiayuan; Xie; Adam Jatowt; Yi Cai

arXiv:2407.02395·cs.SE·July 8, 2024·2 cites

Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval

Jiexin Wang, Xitong Luo, Liuwen Cao, Hongkui He, Hailin Huang, Jiayuan, Xie, Adam Jatowt, Yi Cai

PDF

Open Access

TL;DR

This paper evaluates the security of AI-generated code using large language models, introduces a new dataset for vulnerability assessment, and proposes strategies to improve the security awareness of code generation and repair models.

Contribution

The study presents CodeSecEval, a curated dataset for security evaluation, and analyzes the security shortcomings of current code LLMs, proposing mitigation strategies to enhance their safety.

Findings

01

Current models often generate vulnerable code.

02

Vulnerability-aware strategies can reduce security risks.

03

Certain vulnerability types significantly impact model performance.

Abstract

Large language models (LLMs) have brought significant advancements to code generation and code repair, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like GitHub, raises the risk of inadvertently propagating security vulnerabilities. Despite numerous studies investigating the safety of code LLMs, there remains a gap in comprehensively addressing their security features. In this work, we aim to present a comprehensive study aimed at precisely evaluating and enhancing the security aspects of code LLMs. To support our research, we introduce CodeSecEval, a meticulously curated dataset designed to address 44 critical vulnerability types with 180 distinct samples. CodeSecEval serves as the foundation for the automatic evaluation of code models in two crucial tasks: code generation and code repair, with a strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Reliability and Analysis Research