TL;DR
ReviewGrounder enhances AI-generated peer reviews by integrating explicit rubrics and contextual grounding, leading to more substantive, evidence-based feedback that aligns better with human judgments.
Contribution
It introduces REVIEWBENCH for evaluating reviews and REVIEWGROUNDER, a multi-agent framework that improves review quality through rubric-guided drafting and grounding stages.
Findings
REVIEWGROUNDER outperforms baselines in review quality metrics.
The framework shows improved alignment with human judgments.
Using larger models further enhances review quality.
Abstract
The rapid rise in AI conference submissions has driven increasing exploration of large language models (LLMs) for peer review support. However, LLM-based reviewers often generate superficial, formulaic comments lacking substantive, evidence-grounded feedback. We attribute this to the underutilization of two key components of human reviewing: explicit rubrics and contextual grounding in existing work. To address this, we introduce REVIEWBENCH, a benchmark evaluating review text according to paper-specific rubrics derived from official guidelines, the paper's content, and human-written reviews. We further propose REVIEWGROUNDER, a rubric-guided, tool-integrated multi-agent framework that decomposes reviewing into drafting and grounding stages, enriching shallow drafts via targeted evidence consolidation. Experiments on REVIEWBENCH show that REVIEWGROUNDER, using a Phi-4-14B-based drafter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
