Identifying Inaccurate Descriptions in LLM-generated Code Comments via   Test Execution

Sungmin Kang; Louis Milliken; Shin Yoo

arXiv:2406.14836·cs.SE·June 24, 2024

Identifying Inaccurate Descriptions in LLM-generated Code Comments via Test Execution

Sungmin Kang, Louis Milliken, Shin Yoo

PDF

Open Access

TL;DR

This paper investigates the factual accuracy of LLM-generated code comments, finds existing detection methods ineffective, and proposes a test-based verification approach that correlates well with comment correctness.

Contribution

It introduces the concept of document testing for verifying code comments and demonstrates its effectiveness over existing detection techniques.

Findings

01

Approximately 20% of comments from LLMs are inaccurate.

02

Existing consistency detection techniques lack significant correlation with comment accuracy.

03

Test-based verification shows a strong statistical relationship with comment correctness.

Abstract

Software comments are critical for human understanding of software, and as such many comment generation techniques have been proposed. However, we find that a systematic evaluation of the factual accuracy of generated comments is rare; only subjective accuracy labels have been given. Evaluating comments generated by three Large Language Models (LLMs), we find that even for the best-performing LLM, roughly a fifth of its comments contained demonstrably inaccurate statements. While it seems code-comment consistency detection techniques should be able to detect inaccurate comments, we perform experiments demonstrating they have no statistically significant relationship with comment accuracy, underscoring the substantial difficulty of this problem. To tackle this, we propose the concept of document testing, in which a document is verified by using an LLM to generate tests based on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Web Application Security Vulnerabilities · Natural Language Processing Techniques