Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions

Qianyun Guo; Yibo Li; Yue Liu; Bryan Hooi

arXiv:2603.04191·cs.AI·March 5, 2026

Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions

Qianyun Guo, Yibo Li, Yue Liu, Bryan Hooi

PDF

Open Access

TL;DR

This paper introduces RealPref, a comprehensive benchmark for evaluating how effectively large language models can follow long-term, complex user preferences in realistic, extended interactions, highlighting current limitations.

Contribution

The work presents RealPref, a new benchmark with diverse user profiles and preferences, and analyzes LLM performance in long-horizon, realistic personalization scenarios.

Findings

01

LLM performance declines with longer context and implicit preferences

02

Generalizing preferences to unseen scenarios remains challenging

03

Longer interactions reduce accuracy of preference following

Abstract

Large Language Models (LLMs) are increasingly serving as personal assistants, where users share complex and diverse preferences over extended interactions. However, assessing how well LLMs can follow these preferences in realistic, long-term situations remains underexplored. This work proposes RealPref, a benchmark for evaluating realistic preference-following in personalized user-LLM interactions. RealPref features 100 user profiles, 1300 personalized preferences, four types of preference expression (ranging from explicit to implicit), and long-horizon interaction histories. It includes three types of test questions (multiple-choice, true-or-false, and open-ended), with detailed rubrics for LLM-as-a-judge evaluation. Results indicate that LLM performance significantly drops as context length grows and preference expression becomes more implicit, and that generalizing user preference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions · Artificial Intelligence in Healthcare and Education · Topic Modeling