Measuring the effectiveness of code review comments in GitHub repositories: A machine learning approach

Shadikur Rahman; Umme Ayman Koana; Hasibul Karim Shanto; Mahmuda Akter; Chitra Roy; and Aras M.Ismael

arXiv:2508.16053·cs.SE·August 25, 2025

Measuring the effectiveness of code review comments in GitHub repositories: A machine learning approach

Shadikur Rahman, Umme Ayman Koana, Hasibul Karim Shanto, Mahmuda Akter, Chitra Roy, and Aras M.Ismael

PDF

TL;DR

This study evaluates machine learning techniques for classifying GitHub code review comments by semantic meaning and sentiment polarity, aiming to improve developer understanding and error detection in open-source projects.

Contribution

It compares seven machine learning algorithms for classifying code review comments and identifies Linear SVC as the most accurate method for sentiment analysis.

Findings

01

Linear SVC achieves the highest accuracy among tested algorithms.

02

Manual labeling of 13,557 comments provides a substantial dataset.

03

The approach aids programmers in understanding and addressing code review comments.

Abstract

This paper illustrates an empirical study of the working efficiency of machine learning techniques in classifying code review text by semantic meaning. The code review comments from the source control repository in GitHub were extracted for development activity from the existing year for three open-source projects. Apart from that, programmers need to be aware of their code and point out their errors. In that case, it is a must to classify the sentiment polarity of the code review comments to avoid an error. We manually labelled 13557 code review comments generated by three open source projects in GitHub during the existing year. In order to recognize the sentiment polarity (or sentiment orientation) of code reviews, we use seven machine learning algorithms and compare those results to find the better ones. Among those Linear Support Vector Classifier(SVC) classifier technique achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.