PIRA: Preference-Oriented Instruction-Tuned Reward Models with Dual Aggregation

Yongfu Xue

arXiv:2511.20668·cs.CL·February 3, 2026

PIRA: Preference-Oriented Instruction-Tuned Reward Models with Dual Aggregation

Yongfu Xue

PDF

Open Access 1 Video

TL;DR

PIRA is a novel training paradigm for reward models that leverages preference instruction-following, aggregates diverse rewards, and stabilizes estimates to improve alignment of LLMs with human preferences and reduce overoptimization.

Contribution

PIRA introduces a new training approach combining preference instruction reformulation, reward aggregation, and output stabilization to address key limitations of existing reward models.

Findings

01

Significantly improves reward model performance.

02

Enhances generalization across tasks.

03

Reduces reward overoptimization.

Abstract

Reward models are pivotal for aligning Large Language Models (LLMs) with human preferences. Existing approaches face two key limitations: Discriminative reward models require large-scale annotated data, as they cannot exploit the preference instruction-following capability of LLMs available to generative reward models. Moreover, reward models are particularly prone to reward overoptimization, where LLMs exploit weaknesses in the reward function instead of improving true alignment. We introduce \textbf{PIRA}, a training paradigm that integrates three complementary strategies to address these challenges: (1) reformulating question-answer pairs into preference-task instructions to explicitly leverage LLMs' preference instruction-following capability, (2) averaging the rewards aggregated from diverse preference-task instructions for each sample, which mitigates task-specific bias and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

PIRA: Preference-Oriented Instruction-Tuned Reward Models with Dual Aggregation· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Sentiment Analysis and Opinion Mining