REMOR: Automated Peer Review Generation with LLM Reasoning and Multi-Objective Reinforcement Learning

Pawin Taechoyotin; Daniel Acuna

arXiv:2505.11718·cs.AI·June 30, 2025

REMOR: Automated Peer Review Generation with LLM Reasoning and Multi-Objective Reinforcement Learning

Pawin Taechoyotin, Daniel Acuna

PDF

Open Access

TL;DR

This paper introduces REMOR, an LLM-based peer review system trained with multi-objective reinforcement learning to generate more substantive and human-aligned reviews, outperforming existing AI review systems and matching human quality.

Contribution

The paper presents a novel multi-aspect reward function, a new dataset PeerRT, and the REMOR models trained with GRPO, advancing AI peer review quality through reasoning and reinforcement learning.

Findings

01

REMOR-U and REMOR-H outperform state-of-the-art AI review systems in reward metrics.

02

REMOR models generate reviews comparable in quality to human reviews.

03

Reasoning is crucial for improving AI-generated peer reviews.

Abstract

AI-based peer review systems tend to produce shallow and overpraising suggestions compared to human feedback. Here, we evaluate how well a reasoning LLM trained with multi-objective reinforcement learning (REMOR) can overcome these limitations. We start by designing a multi-aspect reward function that aligns with human evaluation of reviews. The aspects are related to the review itself (e.g., criticisms, novelty) and the relationship between the review and the manuscript (i.e., relevance). First, we perform supervised fine-tuning of DeepSeek-R1-Distill-Qwen-7B using LoRA on PeerRT, a new dataset of high-quality top AI conference reviews enriched with reasoning traces. We then apply Group Relative Policy Optimization (GRPO) to train two models: REMOR-H (with the human-aligned reward) and REMOR-U (with a uniform reward). Interestingly, the human-aligned reward penalizes aspects typically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExpert finding and Q&A systems · Sentiment Analysis and Opinion Mining · Topic Modeling