Identifying Inaccurate Descriptions in LLM-generated Code Comments via Test Execution
Sungmin Kang, Louis Milliken, Shin Yoo

TL;DR
This paper investigates the factual accuracy of LLM-generated code comments, finds existing detection methods ineffective, and proposes a test-based verification approach that correlates well with comment correctness.
Contribution
It introduces the concept of document testing for verifying code comments and demonstrates its effectiveness over existing detection techniques.
Findings
Approximately 20% of comments from LLMs are inaccurate.
Existing consistency detection techniques lack significant correlation with comment accuracy.
Test-based verification shows a strong statistical relationship with comment correctness.
Abstract
Software comments are critical for human understanding of software, and as such many comment generation techniques have been proposed. However, we find that a systematic evaluation of the factual accuracy of generated comments is rare; only subjective accuracy labels have been given. Evaluating comments generated by three Large Language Models (LLMs), we find that even for the best-performing LLM, roughly a fifth of its comments contained demonstrably inaccurate statements. While it seems code-comment consistency detection techniques should be able to detect inaccurate comments, we perform experiments demonstrating they have no statistically significant relationship with comment accuracy, underscoring the substantial difficulty of this problem. To tackle this, we propose the concept of document testing, in which a document is verified by using an LLM to generate tests based on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Web Application Security Vulnerabilities · Natural Language Processing Techniques
