Performance Comparison of Binary Machine Learning Classifiers in   Identifying Code Comment Types: An Exploratory Study

Amila Indika; Peter Y. Washington; Anthony Peruma

arXiv:2303.01035·cs.SE·March 6, 2023·1 cites

Performance Comparison of Binary Machine Learning Classifiers in Identifying Code Comment Types: An Exploratory Study

Amila Indika, Peter Y. Washington, Anthony Peruma

PDF

Open Access

TL;DR

This study compares various binary machine learning classifiers for identifying different categories of code comments across three programming languages, highlighting Linear SVC as the most effective with an average F1 score of 0.5474.

Contribution

It introduces a comprehensive comparison of 19 classifiers for categorizing code comments, demonstrating the effectiveness of Linear SVC in this task.

Findings

01

Linear SVC achieved the highest average F1 score of 0.5474.

02

Performance varies across classifiers and comment categories.

03

The study covers three different programming languages.

Abstract

Code comments are vital to source code as they help developers with program comprehension tasks. Written in natural language (usually English), code comments convey a variety of different information, which are grouped into specific categories. In this study, we construct 19 binary machine learning classifiers for code comment categories that belong to three different programming languages. We present a comparison of performance scores for different types of machine learning classifiers and show that the Linear SVC classifier has the highest average F1 score of 0.5474.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Reliability and Analysis Research