APM: Evaluating Style Personalization in LLMs with Arbitrary Preference Mappings

Philipp Spohn; Leander Girrbach; Zeynep Akata

arXiv:2605.21063·cs.CL·May 21, 2026

APM: Evaluating Style Personalization in LLMs with Arbitrary Preference Mappings

Philipp Spohn, Leander Girrbach, Zeynep Akata

PDF

TL;DR

This paper introduces the APM benchmark to evaluate style personalization in LLMs by decoupling user preferences from response traits using a randomized mapping, enabling unbiased assessment.

Contribution

The paper proposes the APM benchmark for evaluating implicit style personalization in LLMs and adapts multiple personalization methods for rigorous testing.

Findings

01

Routing is the most reliable personalization approach.

02

RAG improves with stronger base LLMs.

03

Soft prompt optimization shows limited gains.

Abstract

Typical LLM responses tend to follow a default style, even though users often have distinct preferences regarding tone, verbosity, and formality that they do not explicitly state in their prompts. Evaluating whether personalization methods can adapt to these implicit preferences is challenging, since users typically provide prompts rather than reference responses, style preferences are not factually verifiable, and reference-free LLM judges may conflate personalization with general response quality. To address these challenges, we introduce the Arbitrary Preference Mapping (APM) benchmark, which decouples user attributes (e.g. enthusiastic) from response principles (e.g. persuasive) via a hidden, randomized mapping $C$ that maps user attributes to preferences about response traits. Because $C$ carries no semantic content and is resampled across runs, models cannot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.