Fight Fire with Fire: How Much Can We Trust ChatGPT on Source   Code-Related Tasks?

Xiao Yu; Lei Liu; Xing Hu; Jacky Wai Keung; Jin Liu; Xin Xia

arXiv:2405.12641·cs.SE·December 2, 2024·2 cites

Fight Fire with Fire: How Much Can We Trust ChatGPT on Source Code-Related Tasks?

Xiao Yu, Lei Liu, Xing Hu, Jacky Wai Keung, Jin Liu, Xin Xia

PDF

Open Access

TL;DR

This paper empirically evaluates ChatGPT's ability to self-verify its code generation, completion, and repair, revealing significant inaccuracies and hallucinations, and suggesting ways to improve its reliability in software development tasks.

Contribution

It provides a comprehensive empirical assessment of ChatGPT's self-verification capabilities in code tasks, highlighting limitations and potential improvements.

Findings

01

ChatGPT often misjudges incorrect code as correct.

02

Hallucinations in self-verification behavior are observed.

03

Guiding questions improve self-verification accuracy.

Abstract

With the increasing utilization of large language models such as ChatGPT during software development, it has become crucial to verify the quality of code content it generates. Recent studies proposed utilizing ChatGPT as both a developer and tester for multi-agent collaborative software development. The multi-agent collaboration empowers ChatGPT to produce test reports for its generated code, enabling it to self-verify the code content and fix bugs based on these reports. However, these studies did not assess the effectiveness of the generated test reports in validating the code. Therefore, we conduct a comprehensive empirical investigation to evaluate ChatGPT's self-verification capability in code generation, code completion, and program repair. We request ChatGPT to (1) generate correct code and then self-verify its correctness; (2) complete code without vulnerabilities and then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Privacy-Preserving Technologies in Data