Characterising the Knowledge about Primitive Variables in Java Code Comments
Mahfouth Alghamdi, Shinpei Hayashi, Takashi Kobayashi, Christoph, Treude

TL;DR
This study introduces an advanced method for detecting and analyzing comments about primitive variables in Java code, revealing that developers document numeric types more thoroughly than String or boolean types.
Contribution
The paper presents a novel approach combining lexical and advanced matching to accurately identify and classify comments about primitive variables in source code.
Findings
Advanced matching outperforms lexical matching in detection accuracy
Developers more frequently document purpose and concept for numeric variables
Boolean variables and certain fields are less well documented
Abstract
Primitive types are fundamental components available in any programming language, which serve as the building blocks of data manipulation. Understanding the role of these types in source code is essential to write software. Little work has been conducted on how often these variables are documented in code comments and what types of knowledge the comments provide about variables of primitive types. In this paper, we present an approach for detecting primitive variables and their description in comments using lexical matching and advanced matching. We evaluate our approaches by comparing the lexical and advanced matching performance in terms of recall, precision, and F-score, against 600 manually annotated variables from a sample of GitHub projects. The performance of our advanced approach based on F-score was superior compared to lexical matching, 0.986 and 0.942, respectively. We then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
