Understanding Defects in Generated Codes by Language Models
Ali Mohammadi Esfahani, Nafiseh Kahani, Samuel A. Ajila

TL;DR
This paper analyzes defects in code generated by Large Language Models, categorizing their nature, and demonstrates that structured prompt engineering techniques can significantly reduce these errors to improve reliability.
Contribution
It introduces a structured defect classification for LLM-generated code and evaluates five prompt engineering methods to enhance code accuracy.
Findings
Structured prompts reduce defect rates in generated code
Functionality and algorithm errors are the most common defects
Prompt engineering techniques improve code reliability
Abstract
This study investigates the reliability of code generation by Large Language Models (LLMs), focusing on identifying and analyzing defects in the generated code. Despite the advanced capabilities of LLMs in automating code generation, ensuring the accuracy and functionality of the output remains a significant challenge. By using a structured defect classification method to understand their nature and origins this study categorizes and analyzes 367 identified defects from code snippets generated by LLMs, with a significant proportion being functionality and algorithm errors. These error categories indicate key areas where LLMs frequently fail, underscoring the need for targeted improvements. To enhance the accuracy of code generation, this paper implemented five prompt engineering techniques, including Scratchpad Prompting, Program of Thoughts Prompting, Chain-of-Thought Prompting, Chain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Software Engineering Research · Scheduling and Optimization Algorithms
