Assessing the Quality and Security of AI-Generated Code: A Quantitative Analysis

Abbas Sabra; Olivier Schmitt; Joseph Tyler

arXiv:2508.14727·cs.SE·August 21, 2025

Assessing the Quality and Security of AI-Generated Code: A Quantitative Analysis

Abbas Sabra, Olivier Schmitt, Joseph Tyler

PDF

Open Access 2 Models

TL;DR

This paper quantitatively evaluates the code quality and security of five major LLMs generating Java code, revealing systemic weaknesses and security vulnerabilities that are not indicated by functional performance metrics.

Contribution

It provides a comprehensive static analysis of LLM-generated code, highlighting shared security and quality issues across multiple models, and emphasizes the importance of verification beyond functional testing.

Findings

01

LLMs can generate functional code but often include bugs and vulnerabilities.

02

No correlation between functional success and code security or quality.

03

Shared systemic weaknesses in LLM code generation methods.

Abstract

This study presents a quantitative evaluation of the code quality and security of five prominent Large Language Models (LLMs): Claude Sonnet 4, Claude 3.7 Sonnet, GPT-4o, Llama 3.2 90B, and OpenCoder 8B. While prior research has assessed the functional performance of LLM-generated code, this research tested LLM output from 4,442 Java coding assignments through comprehensive static analysis using SonarQube. The findings suggest that although LLMs can generate functional code, they also introduce a range of software defects, including bugs, security vulnerabilities, and code smells. These defects do not appear to be isolated; rather, they may represent shared weaknesses stemming from systemic limitations within current LLM code generation methods. In particular, critically severe issues, such as hard-coded passwords and path traversal vulnerabilities, were observed across multiple models.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLaw, AI, and Intellectual Property · Software Engineering Research · Advanced Malware Detection Techniques