From Industry Claims to Empirical Reality: An Empirical Study of Code Review Agents in Pull Requests

Kowshik Chowdhury; Dipayan Banik; K M Ferdous; Shazibul Islam Shamim

arXiv:2604.03196·cs.SE·April 6, 2026

From Industry Claims to Empirical Reality: An Empirical Study of Code Review Agents in Pull Requests

Kowshik Chowdhury, Dipayan Banik, K M Ferdous, Shazibul Islam Shamim

PDF

TL;DR

This study empirically evaluates the effectiveness of autonomous code review agents in pull requests, revealing that they often produce low-signal feedback leading to higher abandonment rates compared to human reviews.

Contribution

It provides the first empirical analysis of CRA review quality and its impact on PR outcomes, highlighting the importance of human oversight in automated code reviews.

Findings

01

CRA-only PRs have a 45.20% merge rate, lower than human-only PRs at 68.37%.

02

Most CRA-only PRs exhibit low signal-to-noise ratios, indicating noisy feedback.

03

High abandonment rates are associated with low-signal CRA feedback.

Abstract

Autonomous coding agents are generating code at an unprecedented scale, with OpenAI Codex alone creating over 400,000 pull requests (PRs) in two months. As agentic PR volumes increase, code review agents (CRAs) have become routine gatekeepers in development workflows. Industry reports claim that CRAs can manage 80% of PRs in open source repositories without human involvement. As a result, understanding the effectiveness of CRA reviews is crucial for maintaining developmental workflows and preventing wasted effort on abandoned pull requests. However, empirical evidence on how CRA feedback quality affects PR outcomes remains limited. The goal of this paper is to help researchers and practitioners understand when and how CRAs influence PR merge success by empirically analyzing reviewer composition and the signal quality of CRA-generated comments. From AIDev's 19,450 PRs, we analyze 3,109…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.