Do LLMs Recognize Your Preferences? Evaluating Personalized Preference   Following in LLMs

Siyan Zhao; Mingyi Hong; Yang Liu; Devamanyu Hazarika; Kaixiang Lin

arXiv:2502.09597·cs.LG·February 14, 2025·3 cites

Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs

Siyan Zhao, Mingyi Hong, Yang Liu, Devamanyu Hazarika, Kaixiang Lin

PDF

Open Access 1 Repo 3 Datasets

TL;DR

This paper introduces PrefEval, a benchmark for assessing how well large language models can recognize, memorize, and follow user preferences in long conversations, revealing current limitations and potential improvements.

Contribution

The paper presents PrefEval, a comprehensive benchmark with 3,000 preference-query pairs, to evaluate and improve LLMs' ability to follow user preferences in multi-turn conversations.

Findings

01

State-of-the-art LLMs perform poorly in preference following, with accuracy below 10% at 10 turns.

02

Advanced prompting and retrieval methods only partially improve preference following.

03

Fine-tuning on PrefEval significantly enhances LLMs' ability to follow preferences.

Abstract

Large Language Models (LLMs) are increasingly used as chatbots, yet their ability to personalize responses to user preferences remains limited. We introduce PrefEval, a benchmark for evaluating LLMs' ability to infer, memorize and adhere to user preferences in a long-context conversational setting. PrefEval comprises 3,000 manually curated user preference and query pairs spanning 20 topics. PrefEval contains user personalization or preference information in both explicit and implicit forms, and evaluates LLM performance using a generation and a classification task. With PrefEval, we evaluated the aforementioned preference following capabilities of 10 open-source and proprietary LLMs in multi-session conversations with varying context lengths up to 100k tokens. We benchmark with various prompting, iterative feedback, and retrieval-augmented generation methods. Our benchmarking effort…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amazon-science/PrefEval
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Semantic Web and Ontologies · Artificial Intelligence in Law