REM-CTX: Automated Peer Review via Reinforcement Learning with Auxiliary Context
Pawin Taechoyotin, Daniel E. Acuna

TL;DR
REM-CTX is a reinforcement learning system that enhances automated peer review by integrating auxiliary visual and scholarly context, leading to higher quality and better contextual grounding than existing models.
Contribution
It introduces a novel reinforcement learning approach with correspondence-aware rewards that incorporate auxiliary context into automated peer review generation.
Findings
REM-CTX outperforms six baselines in review quality across multiple scientific domains.
The system surpasses larger commercial models and other RL baselines in quality and contextual alignment.
Ablation studies show the importance of the two correspondence rewards for targeted improvements.
Abstract
Most automated peer review systems rely on textual manuscript content alone, leaving visual elements such as figures and external scholarly signals underutilized. We introduce REM-CTX, a reinforcement-learning system that incorporates auxiliary context into the review generation process via correspondence-aware reward functions. REM-CTX trains an 8B-parameter language model with Group Relative Policy Optimization (GRPO) and combines a multi-aspect quality reward with two correspondence rewards that explicitly encourage alignment with auxiliary context. Experiments on manuscripts across Computer, Biological, and Physical Sciences show that REM-CTX achieves the highest overall review quality among six baselines, outperforming other systems with substantially larger commercial models, and surpassing the next-best RL baseline across both quality and contextual grounding metrics. Ablation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
