Test-Time Personalization: A Diagnostic Framework and Probabilistic Fix for Scaling Failures

Linhai Zhang; Yulan He

arXiv:2605.10991·cs.LG·May 13, 2026

Test-Time Personalization: A Diagnostic Framework and Probabilistic Fix for Scaling Failures

Linhai Zhang, Yulan He

PDF

TL;DR

This paper introduces a test-time personalization framework for large language models that improves inference scalability by sampling multiple candidates, diagnosing reward model failures, and proposing a probabilistic reward model to enhance performance.

Contribution

It provides a theoretical analysis of test-time scaling, identifies failure modes of reward models, and proposes a probabilistic reward model to mitigate these issues.

Findings

01

Expected utility grows logarithmically with sample size under oracle selection.

02

Standard reward models often fail due to user-level collapse and reward hacking.

03

The proposed probabilistic reward model effectively mitigates failure modes and improves scaling.

Abstract

Existing approaches to LLM personalization focus on constructing better personalized models or inputs, while treating inference as a single-shot process. In this work, we study Test-Time Personalization (TTP) along an unexplored axis: scaling inference-time computation by sampling N candidates from a personalized policy model and selecting the best with a personalized reward model. We prove that oracle selection yields expected utility growing logarithmically with the number of sampled candidates, establishing a theoretical ceiling for test-time scaling. However, standard reward models fail to realize this potential. To diagnose why, we derive a unified scaling law that decomposes any reward model's Best-of-N curve into four measurable quantities and reveals two failure modes, user-level collapse (near-constant prediction for some users) and query-level reward hacking (negative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.