A Critical Review of Large Language Model on Software Engineering: An   Example from ChatGPT and Automated Program Repair

Quanjun Zhang; Tongke Zhang; Juan Zhai; Chunrong Fang; Bowen Yu,; Weisong Sun; Zhenyu Chen

arXiv:2310.08879·cs.SE·April 18, 2024·31 cites

A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair

Quanjun Zhang, Tongke Zhang, Juan Zhai, Chunrong Fang, Bowen Yu,, Weisong Sun, Zhenyu Chen

PDF

Open Access

TL;DR

This paper reviews ChatGPT's bug-fixing capabilities in software engineering, introducing a new benchmark and demonstrating its effectiveness compared to other models, while discussing challenges and future opportunities.

Contribution

It presents {enchmark}, a new benchmark for evaluating ChatGPT's bug-fixing ability, and provides a comprehensive analysis of its performance and potential in software engineering tasks.

Findings

01

ChatGPT fixed 109 out of 151 bugs on {enchmark}

02

Prompt engineering improved bug-fixing accuracy by 34 bugs

03

ChatGPT outperformed CodeT5 and PLBART by 27.5% and 62.4% in accuracy

Abstract

Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising performance across a variety of Software Engineering (SE) tasks, such as Automated Program Repair (APR), code summarization, and code completion. For example, ChatGPT, the latest black-box LLM, has been investigated by numerous recent research studies and has shown impressive performance in various tasks. However, there exists a potential risk of data leakage since these LLMs are usually close-sourced with unknown specific training details, e.g., pre-training datasets. In this paper, we seek to review the bug-fixing capabilities of ChatGPT on a clean APR benchmark with different research objectives. We first introduce {\benchmark}, a new benchmark with buggy and the corresponding fixed programs from competitive programming problems starting from 2023, after the training cutoff point of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Reliability and Analysis Research