On Assessing the Relevance of Code Reviews Authored by Generative Models
Robert Heum\"uller, Frank Ortmeier

TL;DR
This paper introduces a multi-subjective ranking evaluation method for AI-generated code reviews, demonstrating that ChatGPT can outperform human responses in quality, thus offering a more nuanced assessment of AI's role in code review.
Contribution
The paper proposes a novel multi-subjective ranking approach for evaluating code review comments, addressing limitations of existing methods and enabling more meaningful assessments of generative AI performance.
Findings
ChatGPT's comments ranked better than human responses
ChatGPT surpassed StackExchange's accepted answers
The method highlights potential risks of AI in code review
Abstract
The use of large language models like ChatGPT in code review offers promising efficiency gains but also raises concerns about correctness and safety. Existing evaluation methods for code review generation either rely on automatic comparisons to a single ground truth, which fails to capture the variability of human perspectives, or on subjective assessments of "usefulness", a highly ambiguous concept. We propose a novel evaluation approach based on what we call multi-subjective ranking. Using a dataset of 280 self-contained code review requests and corresponding comments from CodeReview StackExchange, multiple human judges ranked the quality of ChatGPT-generated comments alongside the top human responses from the platform. Results show that ChatGPT's comments were ranked significantly better than human ones, even surpassing StackExchange's accepted answers. Going further, our proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Software Engineering Research · Ethics and Social Impacts of AI
