Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
Siyan Zhao, Mingyi Hong, Yang Liu, Devamanyu Hazarika, Kaixiang Lin

TL;DR
This paper introduces PrefEval, a benchmark for assessing how well large language models can recognize, memorize, and follow user preferences in long conversations, revealing current limitations and potential improvements.
Contribution
The paper presents PrefEval, a comprehensive benchmark with 3,000 preference-query pairs, to evaluate and improve LLMs' ability to follow user preferences in multi-turn conversations.
Findings
State-of-the-art LLMs perform poorly in preference following, with accuracy below 10% at 10 turns.
Advanced prompting and retrieval methods only partially improve preference following.
Fine-tuning on PrefEval significantly enhances LLMs' ability to follow preferences.
Abstract
Large Language Models (LLMs) are increasingly used as chatbots, yet their ability to personalize responses to user preferences remains limited. We introduce PrefEval, a benchmark for evaluating LLMs' ability to infer, memorize and adhere to user preferences in a long-context conversational setting. PrefEval comprises 3,000 manually curated user preference and query pairs spanning 20 topics. PrefEval contains user personalization or preference information in both explicit and implicit forms, and evaluates LLM performance using a generation and a classification task. With PrefEval, we evaluated the aforementioned preference following capabilities of 10 open-source and proprietary LLMs in multi-session conversations with varying context lengths up to 100k tokens. We benchmark with various prompting, iterative feedback, and retrieval-augmented generation methods. Our benchmarking effort…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Semantic Web and Ontologies · Artificial Intelligence in Law
