What Makes a Good AI Review? Concern-Level Diagnostics for AI Peer Review

Ming Jin

arXiv:2604.19998·cs.AI·April 23, 2026

What Makes a Good AI Review? Concern-Level Diagnostics for AI Peer Review

Ming Jin

PDF

TL;DR

This paper introduces a concern-level diagnostic framework for AI peer reviews, enabling detailed evaluation of what concerns AI systems identify, how they prioritize them, and their alignment with review rationale.

Contribution

It proposes a reusable evaluation framework using match graphs and an evaluation ladder to audit concern detection, calibration, and decision-making in AI reviews.

Findings

01

Detection of concerns is common but calibration is often the main constraint.

02

Most systems mark a high percentage of concerns as decisive, yet few treat concerns as true blockers.

03

Different inference methods can lead to similar verdicts but hide underlying behavior differences.

Abstract

Evaluating AI-generated reviews by verdict agreement is widely recognized as insufficient, yet current alternatives rarely audit which concerns a system identifies, how it prioritizes them, or whether those priorities align with the review rationale that shaped the final assessment. We propose concern alignment, a diagnostic framework that evaluates AI reviews at the concern level rather than only at the verdict level. The framework's core data structure is the match graph, a bipartite alignment between official and AI-generated concerns annotated with match type, severity, and post-rebuttal treatment. From this artifact we derive an evaluation ladder that moves from binary accuracy to concern detection, verdict-stratified behavior, decision-aware calibration, and rebuttal-aware decomposition. In a pilot study of four public AI review systems evaluated in six configurations,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.