On Assessing the Relevance of Code Reviews Authored by Generative Models

Robert Heum\"uller; Frank Ortmeier

arXiv:2512.15466·cs.SE·December 18, 2025

On Assessing the Relevance of Code Reviews Authored by Generative Models

Robert Heum\"uller, Frank Ortmeier

PDF

Open Access

TL;DR

This paper introduces a multi-subjective ranking evaluation method for AI-generated code reviews, demonstrating that ChatGPT can outperform human responses in quality, thus offering a more nuanced assessment of AI's role in code review.

Contribution

The paper proposes a novel multi-subjective ranking approach for evaluating code review comments, addressing limitations of existing methods and enabling more meaningful assessments of generative AI performance.

Findings

01

ChatGPT's comments ranked better than human responses

02

ChatGPT surpassed StackExchange's accepted answers

03

The method highlights potential risks of AI in code review

Abstract

The use of large language models like ChatGPT in code review offers promising efficiency gains but also raises concerns about correctness and safety. Existing evaluation methods for code review generation either rely on automatic comparisons to a single ground truth, which fails to capture the variability of human perspectives, or on subjective assessments of "usefulness", a highly ambiguous concept. We propose a novel evaluation approach based on what we call multi-subjective ranking. Using a dataset of 280 self-contained code review requests and corresponding comments from CodeReview StackExchange, multiple human judges ranked the quality of ChatGPT-generated comments alongside the top human responses from the platform. Results show that ChatGPT's comments were ranked significantly better than human ones, even surpassing StackExchange's accepted answers. Going further, our proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Software Engineering Research · Ethics and Social Impacts of AI