Hold On! Is My Feedback Useful? Evaluating the Usefulness of Code Review Comments
Sharif Ahmed, Nasir U. Eisty

TL;DR
This paper evaluates methods to determine the usefulness of code review comments, demonstrating that models using GPT-4o and Bag-of-Words outperform baselines across diverse datasets, advancing research in automated review comment assessment.
Contribution
It introduces new features and compares various approaches, including large language models and featureless methods, for predicting comment usefulness in code reviews.
Findings
GPT-4o achieves state-of-the-art performance
Featureless Bag-of-Words approach is highly effective
Models generalize across open-source and commercial datasets
Abstract
Context: In collaborative software development, the peer code review process proves beneficial only when the reviewers provide useful comments. Objective: This paper investigates the usefulness of Code Review Comments (CR comments) through textual feature-based and featureless approaches. Method: We select three available datasets from both open-source and commercial projects. Additionally, we introduce new features from software and non-software domains. Moreover, we experiment with the presence of jargon, voice, and codes in CR comments and classify the usefulness of CR comments through featurization, bag-of-words, and transfer learning techniques. Results: Our models outperform the baseline by achieving state-of-the-art performance. Furthermore, the result demonstrates that the commercial gigantic LLM, GPT-4o, or non-commercial naive featureless approach, Bag-of-Word with TF-IDF, is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
