What's Wrong with Your Code Generated by Large Language Models? An Extensive Study
Shihan Dou, Haoxiang Jia, Shenxi Wu, Huiyuan Zheng, Muling Wu, Yunbo Tao, Ming Zhang, Mingxu Chai, Jessica Fan, Zhiheng Xi, Rui Zheng, Yueming Wu, Ming Wen, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang

TL;DR
This extensive empirical study evaluates the performance and limitations of various large language models in code generation, identifying challenges with complex problems, bug types, and proposing a self-critique correction method.
Contribution
The paper provides a comprehensive analysis of LLMs' code generation capabilities, introduces a bug taxonomy, and proposes a novel self-critique method to improve code quality without additional training.
Findings
LLMs struggle with complex problems, producing shorter yet more complicated code.
A bug taxonomy with 3 categories and 10 sub-categories was developed and analyzed.
A self-critique iterative method improves code correctness without retraining.
Abstract
The increasing development of LLMs in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundaries of existing methods. To bridge this gap, we conducted an extensive empirical study evaluating the performance of three leading closed-source LLMs and six popular open-source LLMs on three commonly used benchmarks. Our investigation, which evaluated the length, cyclomatic complexity and API number of the generated code, revealed that these LLMs face challenges in generating successful code for more complex problems, and tend to produce code that is shorter yet more complicated as compared to canonical solutions.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsSoftmax · Attention Is All You Need
