Bugs in Large Language Models Generated Code: An Empirical Study
Florian Tambon, Arghavan Moradi Dakhel, Amin Nikanjam, Foutse Khomh,, Michel C. Desmarais, Giuliano Antoniol

TL;DR
This empirical study analyzes 333 bugs in code generated by large language models, identifying 10 common bug patterns and validating their significance through a survey, to improve quality assurance in AI-assisted coding.
Contribution
The paper introduces a taxonomy of 10 bug patterns in LLM-generated code and validates their relevance with practitioners, advancing understanding of AI-generated software bugs.
Findings
Identified 10 distinctive bug patterns in LLM-generated code.
Validated bug patterns through a survey with 34 practitioners.
Provides insights for developing quality assurance techniques.
Abstract
Large Language Models (LLMs) for code have gained significant attention recently. They can generate code in different programming languages based on provided prompts, fulfilling a long-lasting dream in Software Engineering (SE), i.e., automatic code generation. Similar to human-written code, LLM-generated code is prone to bugs, and these bugs have not yet been thoroughly examined by the community. Given the increasing adoption of LLM-based code generation tools (e.g., GitHub Copilot) in SE activities, it is critical to understand the characteristics of bugs contained in code generated by LLMs. This paper examines a sample of 333 bugs collected from code generated using three leading LLMs (i.e., CodeGen, PanGu-Coder, and Codex) and identifies the following 10 distinctive bug patterns: Misinterpretations, Syntax Error, Silly Mistake, Prompt-biased code, Missing Corner Case, Wrong Input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Testing and Debugging Techniques
MethodsCodeGen
