IRPM: Intergroup Relative Preference Modeling for Pointwise Generative Reward Models

Haonan Song; Qingchen Xie; Huan Zhu; Feng Xiao; Luxi Xing; Liu Kang; Fuzhen Li; Zhiyong Zheng; Feng Jiang; Ziheng Li; Kun Yan; Qingyi Si; Yanghua Xiao; Hongcheng Guo; Fan Yang

arXiv:2601.00677·cs.LG·February 2, 2026

IRPM: Intergroup Relative Preference Modeling for Pointwise Generative Reward Models

Haonan Song, Qingchen Xie, Huan Zhu, Feng Xiao, Luxi Xing, Liu Kang, Fuzhen Li, Zhiyong Zheng, Feng Jiang, Ziheng Li, Kun Yan, Qingyi Si, Yanghua Xiao, Hongcheng Guo, Fan Yang

PDF

Open Access

TL;DR

IRPM introduces a scalable, interpretable pointwise reward modeling method for RLHF that reduces computational complexity from quadratic to linear, achieving state-of-the-art results.

Contribution

It extends the Bradley--Terry paradigm to intergroup comparisons, enabling efficient pointwise reward estimation from pairwise preferences.

Findings

01

IRPM outperforms existing pointwise GRMs on benchmark datasets.

02

IRPM approaches the performance of pairwise GRMs.

03

IRPM significantly reduces computational costs during RL training.

Abstract

Generative Reward Models (GRMs) have demonstrated strong performance in reward modeling, due to their interpretability and potential for refinement through reinforcement learning (RL). However, widely used pairwise GRMs create a computational bottleneck in reinforcement learning from human feedback (RLHF), when calibrating or aggregating preference signals over n candidates, often incurring O(n^2) pairwise judgments. To address this issue, we propose Intergroup Relative Preference Modeling (IRPM), an RL-based method that extends the Bradley--Terry preference-learning paradigm via intergroup comparisons to train pointwise GRMs from pairwise preference data. IRPM derives pointwise reward for each response by contrasting groups of chosen vs. rejected samples, enabling pointwise scores comparable across candidate sets and O(n) reward evaluation for a variable number of candidates during RL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Recommender Systems and Techniques