R-Align: Enhancing Generative Reward Models through Rationale-Centric Meta-Judging

Yanlin Lai; Mitt Huang; Hangyu Guo; Xiangfeng Wang; Haodong Li; Shaoxiong Zhan; Liang Zhao; Chengyuan Yao; Yinmin Zhang; Qi Han; Chun Yuan; Zheng Ge; Xiangyu Zhang; Daxin Jiang

arXiv:2602.06763·cs.CL·February 9, 2026

R-Align: Enhancing Generative Reward Models through Rationale-Centric Meta-Judging

Yanlin Lai, Mitt Huang, Hangyu Guo, Xiangfeng Wang, Haodong Li, Shaoxiong Zhan, Liang Zhao, Chengyuan Yao, Yinmin Zhang, Qi Han, Chun Yuan, Zheng Ge, Xiangyu Zhang, Daxin Jiang

PDF

Open Access

TL;DR

This paper introduces R-Align, a method to improve generative reward models by explicitly supervising rationale alignment, leading to better alignment with human preferences and improved downstream task performance.

Contribution

It demonstrates that reasoning fidelity predicts RLHF success and proposes R-Align to enhance rationale consistency and model alignment.

Findings

01

Rationale fidelity strongly predicts RLHF outcomes.

02

R-Align reduces spurious correctness in reward models.

03

R-Align improves performance across multiple tasks.

Abstract

Reinforcement Learning from Human Feedback (RLHF) remains indispensable for aligning large language models (LLMs) in subjective domains. To enhance robustness, recent work shifts toward Generative Reward Models (GenRMs) that generate rationales before predicting preferences. Yet in GenRM training and evaluation, practice remains outcome-label-only, leaving reasoning quality unchecked. We show that reasoning fidelity-the consistency between a GenRM's preference decision and reference decision rationales-is highly predictive of downstream RLHF outcomes, beyond standard label accuracy. Specifically, we repurpose existing reward-model benchmarks to compute Spurious Correctness (S-Corr)-the fraction of label-correct decisions with rationales misaligned with golden judgments. Our empirical evaluation reveals substantial S-Corr even for competitive GenRMs, and higher S-Corr is associated with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications