Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions
Qianyun Guo, Yibo Li, Yue Liu, Bryan Hooi

TL;DR
This paper introduces RealPref, a comprehensive benchmark for evaluating how effectively large language models can follow long-term, complex user preferences in realistic, extended interactions, highlighting current limitations.
Contribution
The work presents RealPref, a new benchmark with diverse user profiles and preferences, and analyzes LLM performance in long-horizon, realistic personalization scenarios.
Findings
LLM performance declines with longer context and implicit preferences
Generalizing preferences to unseen scenarios remains challenging
Longer interactions reduce accuracy of preference following
Abstract
Large Language Models (LLMs) are increasingly serving as personal assistants, where users share complex and diverse preferences over extended interactions. However, assessing how well LLMs can follow these preferences in realistic, long-term situations remains underexplored. This work proposes RealPref, a benchmark for evaluating realistic preference-following in personalized user-LLM interactions. RealPref features 100 user profiles, 1300 personalized preferences, four types of preference expression (ranging from explicit to implicit), and long-horizon interaction histories. It includes three types of test questions (multiple-choice, true-or-false, and open-ended), with detailed rubrics for LLM-as-a-judge evaluation. Results indicate that LLM performance significantly drops as context length grows and preference expression becomes more implicit, and that generalizing user preference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Artificial Intelligence in Healthcare and Education · Topic Modeling
