Impact of Comments on LLM Comprehension of Legacy Code

Rock Sabetto; Emily Escamilla; Devesh Agarwal; Sujay Kandwal; Justin F. Brunelle; Scott Rosen; Nitin Naik; Samruddhi Thaker; Eric O. Scott; Jacob Zimmer; Amit Madan; Arun Sridharan; Doug Wendt; Michael Doyle; Christopher Glasz; Jasper Phillips; William Macke; Colin Diggs; Michael Bartholf; Zachary Robin; and Paul Ursino

arXiv:2506.11007·cs.SE·June 16, 2025

Impact of Comments on LLM Comprehension of Legacy Code

Rock Sabetto, Emily Escamilla, Devesh Agarwal, Sujay Kandwal, Justin F. Brunelle, Scott Rosen, Nitin Naik, Samruddhi Thaker, Eric O. Scott, Jacob Zimmer, Amit Madan, Arun Sridharan, Doug Wendt, Michael Doyle, Christopher Glasz, Jasper Phillips, William Macke, Colin Diggs

PDF

Open Access

TL;DR

This paper investigates how comments influence large language models' understanding of legacy code, highlighting the importance of documentation quality and comment presence in improving LLM comprehension of older programming languages.

Contribution

It introduces an evaluation method using multiple-choice questions to objectively assess LLM understanding of legacy code and examines the effects of comments and documentation quality.

Findings

01

Comments significantly affect LLM comprehension of legacy code.

02

Inaccurate comments can hinder LLM understanding.

03

Presence of comments generally improves comprehension.

Abstract

Large language models (LLMs) have been increasingly integrated into software engineering and maintenance tasks due to their high performance with software engineering tasks and robust understanding of modern programming languages. However, the ability of LLMs to comprehend code written with legacy languages remains a research gap challenged by real-world legacy systems lacking or containing inaccurate documentation that may impact LLM comprehension. To assess LLM comprehension of legacy languages, there is a need for objective LLM evaluation. In order to objectively measure LLM comprehension of legacy languages, we need an efficient, quantitative evaluation method. We leverage multiple-choice question answering (MCQA), an emerging LLM evaluation methodology, to evaluate LLM comprehension of legacy code and the impact of comment prevalence and inaccurate comments. In this work, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security · Artificial Intelligence in Law