Test-Time Alignment via Hypothesis Reweighting

Yoonho Lee; Jonathan Williams; Henrik Marklund; Archit Sharma; Eric Mitchell; Anikait Singh; Chelsea Finn

arXiv:2412.08812·cs.LG·April 21, 2026

Test-Time Alignment via Hypothesis Reweighting

Yoonho Lee, Jonathan Williams, Henrik Marklund, Archit Sharma, Eric Mitchell, Anikait Singh, Chelsea Finn

PDF

TL;DR

HyRe enables real-time personalization of reward models by reweighting ensemble heads based on minimal user preference data, significantly outperforming existing methods with negligible computational cost.

Contribution

Proposes Hypothesis Reweighting (HyRe), a novel method for efficient, inference-time personalization by reweighting ensemble heads with minimal labeled examples.

Findings

01

HyRe surpasses state-of-the-art reward models on RewardBench at 2B and 8B scale.

02

HyRe improves reward model accuracy by 20% across 32 tasks.

03

Reweighting ensemble heads with 1-5 preference pairs is effective for personalization.

Abstract

Reward models trained on aggregate preferences often fail to capture individual users' values, but existing adaptation methods such as fine-tuning or long-context conditioning are too costly for real-time personalization. We propose Hypothesis Reweighting (HyRe), which enables real-time personalization by reweighting ensemble members using just 1-5 labeled examples from the target user or domain. Our method builds on the empirical observation that when different heads capture different valid interpretations of preference data, reweighting them can substantially outperform uniform averaging. HyRe trains a single network with multiple prediction heads that capture different valid interpretations of preference data, then uses a Bayesian update to upweight the heads that best match the target user's preferences. This requires only a single forward pass with negligible (<1%) computational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.