Is LLM-Generated Code More Maintainable \& Reliable than Human-Written Code?
Alfred Santa Molison, Marcia Moraes, Glaucia Melo, Fabio Santos, Wesley K. G. Assuncao

TL;DR
This study empirically compares the maintainability and reliability of LLM-generated code to human-written code across various difficulty levels, revealing that LLMs often produce less bug-prone code but can introduce structural issues in complex tasks.
Contribution
It provides a comprehensive empirical analysis of LLM-generated code quality, highlighting strengths in bug reduction and limitations in structural correctness for complex problems.
Findings
LLM-generated code has fewer bugs and requires less effort to fix.
Fine-tuning reduces high-severity issues but may decrease overall performance.
In complex tasks, LLM solutions can introduce structural issues not seen in human code.
Abstract
Background: The rise of Large Language Models (LLMs) in software development has opened new possibilities for code generation. Despite the widespread use of this technology, it remains unclear how well LLMs generate code solutions in terms of software quality and how they compare to human-written code. Aims: This study compares the internal quality attributes of LLM-generated and human-written code. Method: Our empirical study integrates datasets of coding tasks, three LLM configurations (zero-shot, few-shot, and fine-tuning), and SonarQube to assess software quality. The dataset comprises Python code solutions across three difficulty levels: introductory, interview, and competition. We analyzed key code quality metrics, including maintainability and reliability, and the estimated effort required to resolve code issues. Results: Our analysis shows that LLM-generated code has fewer bugs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Artificial Intelligence in Healthcare and Education · Topic Modeling
