A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair
Quanjun Zhang, Tongke Zhang, Juan Zhai, Chunrong Fang, Bowen Yu,, Weisong Sun, Zhenyu Chen

TL;DR
This paper reviews ChatGPT's bug-fixing capabilities in software engineering, introducing a new benchmark and demonstrating its effectiveness compared to other models, while discussing challenges and future opportunities.
Contribution
It presents {enchmark}, a new benchmark for evaluating ChatGPT's bug-fixing ability, and provides a comprehensive analysis of its performance and potential in software engineering tasks.
Findings
ChatGPT fixed 109 out of 151 bugs on {enchmark}
Prompt engineering improved bug-fixing accuracy by 34 bugs
ChatGPT outperformed CodeT5 and PLBART by 27.5% and 62.4% in accuracy
Abstract
Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising performance across a variety of Software Engineering (SE) tasks, such as Automated Program Repair (APR), code summarization, and code completion. For example, ChatGPT, the latest black-box LLM, has been investigated by numerous recent research studies and has shown impressive performance in various tasks. However, there exists a potential risk of data leakage since these LLMs are usually close-sourced with unknown specific training details, e.g., pre-training datasets. In this paper, we seek to review the bug-fixing capabilities of ChatGPT on a clean APR benchmark with different research objectives. We first introduce {\benchmark}, a new benchmark with buggy and the corresponding fixed programs from competitive programming problems starting from 2023, after the training cutoff point of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Reliability and Analysis Research
