# Analyzing Code Comments to Boost Program Comprehension

**Authors:** Yusuke Shinyama, Yoshitaka Arahori, Katsuhiko Gondow

arXiv: 1905.02050 · 2022-03-18

## TL;DR

This paper presents a method to identify explanatory code comments within source code, using a decision-tree classifier, to enhance program understanding by focusing on local comments.

## Contribution

It introduces eleven comment categories and a classifier achieving 60% precision and 80% recall for identifying explanatory comments in Java and Python projects.

## Key findings

- Preconditional and postconditional comments are most common.
- English comments exhibit consistent grammatical structures.
- The method analyzed 2,000 GitHub projects.

## Abstract

We are trying to find source code comments that help programmers understand a nontrivial part of source code. One of such examples would be explaining to assign a zero as a way to "clear" a buffer. Such comments are invaluable to programmers and identifying them correctly would be of great help. Toward this goal, we developed a method to discover explanatory code comments in a source code. We first propose eleven distinct categories of code comments. We then developed a decision-tree based classifier that can identify explanatory comments with 60% precision and 80% recall. We analyzed 2,000 GitHub projects that are written in two languages: Java and Python. This task is novel in that it focuses on a microscopic comment ("local comment") within a method or function, in contrast to the prior efforts that focused on API- or method-level comments. We also investigated how different category of comments is used in different projects. Our key finding is that there are two dominant types of comments: preconditional and postconditional. Our findings also suggest that many English code comments have a certain grammatical structure that are consistent across different projects.

---
Source: https://tomesphere.com/paper/1905.02050