Analyzing Code Comments to Boost Program Comprehension

Yusuke Shinyama; Yoshitaka Arahori; Katsuhiko Gondow

arXiv:1905.02050·cs.SE·March 18, 2022

Analyzing Code Comments to Boost Program Comprehension

Yusuke Shinyama, Yoshitaka Arahori, Katsuhiko Gondow

PDF

Open Access

TL;DR

This paper presents a method to identify explanatory code comments within source code, using a decision-tree classifier, to enhance program understanding by focusing on local comments.

Contribution

It introduces eleven comment categories and a classifier achieving 60% precision and 80% recall for identifying explanatory comments in Java and Python projects.

Findings

01

Preconditional and postconditional comments are most common.

02

English comments exhibit consistent grammatical structures.

03

The method analyzed 2,000 GitHub projects.

Abstract

We are trying to find source code comments that help programmers understand a nontrivial part of source code. One of such examples would be explaining to assign a zero as a way to "clear" a buffer. Such comments are invaluable to programmers and identifying them correctly would be of great help. Toward this goal, we developed a method to discover explanatory code comments in a source code. We first propose eleven distinct categories of code comments. We then developed a decision-tree based classifier that can identify explanatory comments with 60% precision and 80% recall. We analyzed 2,000 GitHub projects that are written in two languages: Java and Python. This task is novel in that it focuses on a microscopic comment ("local comment") within a method or function, in contrast to the prior efforts that focused on API- or method-level comments. We also investigated how different category…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Engineering Techniques and Practices