Studying Quality Improvements Recommended via Manual and Automated Code Review

Giuseppe Crupi; Rosalia Tufano; Gabriele Bavota

arXiv:2602.11925·cs.SE·February 13, 2026

Studying Quality Improvements Recommended via Manual and Automated Code Review

Giuseppe Crupi, Rosalia Tufano, Gabriele Bavota

PDF

Open Access

TL;DR

This study compares human and AI (ChatGPT-4) code reviews, revealing AI's limitations in identifying quality issues but also its potential as a supplementary tool to human reviewers.

Contribution

It provides a detailed comparison of human and AI code review recommendations, highlighting the strengths and limitations of current DL-based approaches.

Findings

01

ChatGPT recommends 2.4 times more code changes than humans.

02

ChatGPT detects only 10% of issues identified by humans.

03

Approximately 40% of AI suggestions point to meaningful quality issues.

Abstract

Several Deep Learning (DL)-based techniques have been proposed to automate code review. Still, it is unclear the extent to which these approaches can recommend quality improvements as a human reviewer. We study the similarities and differences between code reviews performed by humans and those automatically generated by DL models, using ChatGPT-4 as representative of the latter. In particular, we run a mining-based study in which we collect and manually inspect 739 comments posted by human reviewers to suggest code changes in 240 PRs. The manual inspection aims at classifying the type of quality improvement recommended by human reviewers (e.g., rename variable/constant). Then, we ask ChatGPT to perform a code review on the same PRs and we compare the quality improvements it recommends against those suggested by the human reviewers. We show that while, on average, ChatGPT tends to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI