CodeLMSec Benchmark: Systematically Evaluating and Finding Security   Vulnerabilities in Black-Box Code Language Models

Hossein Hajipour; Keno Hassler; Thorsten Holz; Lea Sch\"onherr; Mario; Fritz

arXiv:2302.04012·cs.CR·October 24, 2023·6 cites

CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models

Hossein Hajipour, Keno Hassler, Thorsten Holz, Lea Sch\"onherr, Mario, Fritz

PDF

Open Access

TL;DR

This paper introduces a systematic method and benchmark for evaluating security vulnerabilities in black-box code language models, focusing on their tendency to generate insecure code due to training data issues.

Contribution

It presents the first approach to automatically identify security vulnerabilities in black-box code generation models and creates a benchmark dataset for evaluating their security weaknesses.

Findings

01

Effective approximation of black-box models using few-shot prompting

02

Demonstrated models generate code with high-risk security vulnerabilities

03

Established a diverse benchmark dataset for security evaluation

Abstract

Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Their advances in competition-level programming problems have made them an essential pillar of AI-assisted pair programming, and tools such as GitHub Copilot have emerged as part of the daily programming workflow used by millions of developers. The training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure. While these models have been extensively assessed for their ability to produce functionally correct programs, there remains a lack of comprehensive investigations and benchmarks addressing the security aspects of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software Testing and Debugging Techniques