PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model

Baijiong Lin; Weisen Jiang; Yuancheng Xu; Hao Chen; Ying-Cong Chen

arXiv:2505.06274·cs.LG·May 13, 2025

PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model

Baijiong Lin, Weisen Jiang, Yuancheng Xu, Hao Chen, Ying-Cong Chen

PDF

Open Access

TL;DR

PARM introduces a unified, preference-aware autoregressive reward model that efficiently aligns large language models with diverse user preferences during inference, reducing costs and improving accuracy.

Contribution

It proposes PARM, a single, preference-aware ARM trained across all preference dimensions, addressing limitations of previous multi-ARM approaches.

Findings

01

PARM reduces inference costs compared to multiple ARMs.

02

PARM achieves better alignment with user preferences.

03

PARM enables weak-to-strong guidance for resource-limited settings.

Abstract

Multi-objective test-time alignment aims to adapt large language models (LLMs) to diverse multi-dimensional user preferences during inference while keeping LLMs frozen. Recently, GenARM (Xu et al., 2025) first independently trains Autoregressive Reward Models (ARMs) for each preference dimension without awareness of each other, then combines their outputs based on user-specific preference vectors during inference to achieve multi-objective test-time alignment, leading to two key limitations: the need for \textit{multiple} ARMs increases the inference cost, and the separate training of ARMs causes the misalignment between the guided generation and the user preferences. To address these issues, we propose Preference-aware ARM (PARM), a single unified ARM trained across all preference dimensions. PARM uses our proposed Preference-Aware Bilinear Low-Rank Adaptation (PBLoRA), which employs a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Recommender Systems and Techniques