PaTaRM: Bridging Pairwise and Pointwise Signals via Preference-Aware Task-Adaptive Reward Modeling

Ai Jian; Jingqing Ruan; Xing Ma; Xiaoyun Zhang; Dailin Li; Weipeng Zhang; Ke Zeng; Xunliang Cai

arXiv:2510.24235·cs.LG·April 21, 2026

PaTaRM: Bridging Pairwise and Pointwise Signals via Preference-Aware Task-Adaptive Reward Modeling

Ai Jian, Jingqing Ruan, Xing Ma, Xiaoyun Zhang, Dailin Li, Weipeng Zhang, Ke Zeng, Xunliang Cai

PDF

1 Repo 3 Models 1 Datasets

TL;DR

PaTaRM introduces a novel reward modeling approach that leverages pairwise data for pointwise training and dynamically adapts evaluation criteria, significantly improving language model alignment with human preferences.

Contribution

It presents a new method combining pairwise and pointwise signals through a preference-aware mechanism and task-adaptive rubric, enhancing reward model training and evaluation.

Findings

01

Achieves 8.7% average improvement on RewardBench and RMBench.

02

Boosts downstream RLHF performance by 13.6% on IFEval and InFoBench.

03

Enables robust pointwise training without explicit rating labels.

Abstract

Reward models (RMs) are central to reinforcement learning from human feedback (RLHF), providing the critical supervision signals that align large language models (LLMs) with human preferences. Generative reward models (GRMs) provide greater interpretability than traditional scalar RMs, but they come with a critical trade-off: pairwise methods are hindered by a training-inference mismatch, while pointwise methods require expensive absolute annotations. To bridge this gap, we propose the Preference-aware Task-adaptive Reward Model (PaTaRM). Unlike prior approaches, PaTaRM enables robust pointwise training using readily available pairwise data via a novel Preference-Aware Reward (PAR) mechanism, eliminating the need for explicit rating labels. Furthermore, it incorporates a Task-Adaptive Rubric system that dynamically generates instance-specific criteria for precise evaluation. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JaneEyre0530/PaTaRM
github

Models

Datasets

AIJian/PaTaRM-data
dataset· 67 dl
67 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.