How secure is AI-generated Code: A Large-Scale Comparison of Large   Language Models

Norbert Tihanyi; Tamas Bisztray; Mohamed Amine Ferrag; Ridhi Jain,; Lucas C. Cordeiro

arXiv:2404.18353·cs.CR·December 12, 2024

How secure is AI-generated Code: A Large-Scale Comparison of Large Language Models

Norbert Tihanyi, Tamas Bisztray, Mohamed Amine Ferrag, Ridhi Jain,, Lucas C. Cordeiro

PDF

1 Repo

TL;DR

This large-scale study evaluates the security of AI-generated C code from various LLMs, revealing that over 62% of generated programs contain vulnerabilities, with minor differences across models, emphasizing the need for validation.

Contribution

The paper introduces the FormAI-v2 dataset and provides a comprehensive comparison of multiple state-of-the-art LLMs regarding their tendency to produce vulnerable code.

Findings

01

Over 62% of generated programs are vulnerable.

02

Models show similar vulnerability patterns with minor differences.

03

Formal verification reduces false positives and negatives in vulnerability detection.

Abstract

This study compares state-of-the-art Large Language Models (LLMs) on their tendency to generate vulnerabilities when writing C programs using a neutral zero-shot prompt. Tihanyi et al. introduced the FormAI dataset at PROMISE'23, featuring 112,000 C programs generated by GPT-3.5-turbo, with over 51.24% identified as vulnerable. We extended that research with a large-scale study involving 9 state-of-the-art models such as OpenAI's GPT-4o-mini, Google's Gemini Pro 1.0, TII's 180 billion-parameter Falcon, Meta's 13 billion-parameter Code Llama, and several other compact models. Additionally, we introduce the FormAI-v2 dataset, which comprises 331 000 compilable C programs generated by these LLMs. Each program in the dataset is labeled based on the vulnerabilities detected in its source code through formal verification, using the Efficient SMT-based Context-Bounded Model Checker (ESBMC).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cybermetric/cybermetric
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Linear Layer · Label Smoothing · Adam · Layer Normalization · Attention Dropout