Is LLM-Generated Code More Maintainable \& Reliable than Human-Written Code?

Alfred Santa Molison; Marcia Moraes; Glaucia Melo; Fabio Santos; Wesley K. G. Assuncao

arXiv:2508.00700·cs.SE·August 4, 2025

Is LLM-Generated Code More Maintainable \& Reliable than Human-Written Code?

Alfred Santa Molison, Marcia Moraes, Glaucia Melo, Fabio Santos, Wesley K. G. Assuncao

PDF

Open Access

TL;DR

This study empirically compares the maintainability and reliability of LLM-generated code to human-written code across various difficulty levels, revealing that LLMs often produce less bug-prone code but can introduce structural issues in complex tasks.

Contribution

It provides a comprehensive empirical analysis of LLM-generated code quality, highlighting strengths in bug reduction and limitations in structural correctness for complex problems.

Findings

01

LLM-generated code has fewer bugs and requires less effort to fix.

02

Fine-tuning reduces high-severity issues but may decrease overall performance.

03

In complex tasks, LLM solutions can introduce structural issues not seen in human code.

Abstract

Background: The rise of Large Language Models (LLMs) in software development has opened new possibilities for code generation. Despite the widespread use of this technology, it remains unclear how well LLMs generate code solutions in terms of software quality and how they compare to human-written code. Aims: This study compares the internal quality attributes of LLM-generated and human-written code. Method: Our empirical study integrates datasets of coding tasks, three LLM configurations (zero-shot, few-shot, and fine-tuning), and SonarQube to assess software quality. The dataset comprises Python code solutions across three difficulty levels: introductory, interview, and competition. We analyzed key code quality metrics, including maintainability and reliability, and the estimated effort required to resolve code issues. Results: Our analysis shows that LLM-generated code has fewer bugs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Artificial Intelligence in Healthcare and Education · Topic Modeling