REMOR: Automated Peer Review Generation with LLM Reasoning and Multi-Objective Reinforcement Learning
Pawin Taechoyotin, Daniel Acuna

TL;DR
This paper introduces REMOR, an LLM-based peer review system trained with multi-objective reinforcement learning to generate more substantive and human-aligned reviews, outperforming existing AI review systems and matching human quality.
Contribution
The paper presents a novel multi-aspect reward function, a new dataset PeerRT, and the REMOR models trained with GRPO, advancing AI peer review quality through reasoning and reinforcement learning.
Findings
REMOR-U and REMOR-H outperform state-of-the-art AI review systems in reward metrics.
REMOR models generate reviews comparable in quality to human reviews.
Reasoning is crucial for improving AI-generated peer reviews.
Abstract
AI-based peer review systems tend to produce shallow and overpraising suggestions compared to human feedback. Here, we evaluate how well a reasoning LLM trained with multi-objective reinforcement learning (REMOR) can overcome these limitations. We start by designing a multi-aspect reward function that aligns with human evaluation of reviews. The aspects are related to the review itself (e.g., criticisms, novelty) and the relationship between the review and the manuscript (i.e., relevance). First, we perform supervised fine-tuning of DeepSeek-R1-Distill-Qwen-7B using LoRA on PeerRT, a new dataset of high-quality top AI conference reviews enriched with reasoning traces. We then apply Group Relative Policy Optimization (GRPO) to train two models: REMOR-H (with the human-aligned reward) and REMOR-U (with a uniform reward). Interestingly, the human-aligned reward penalizes aspects typically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems · Sentiment Analysis and Opinion Mining · Topic Modeling
