Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation
Se-eun Yoon, Zhankui He, Jessica Maria Echterhoff, Julian McAuley

TL;DR
This paper assesses the potential of large language models to serve as synthetic user simulators in conversational recommendation systems, introducing a protocol to evaluate their human-like behavior across five key tasks.
Contribution
It presents a new evaluation protocol for measuring how well language models emulate human user behavior in conversational recommendation scenarios.
Findings
Language models show promise but deviate from human behavior in key tasks.
Evaluation tasks reveal specific areas where models can be improved.
Prompting strategies can reduce deviations from human-like responses.
Abstract
Synthetic users are cost-effective proxies for real users in the evaluation of conversational recommender systems. Large language models show promise in simulating human-like behavior, raising the question of their ability to represent a diverse population of users. We introduce a new protocol to measure the degree to which language models can accurately emulate human behavior in conversational recommendation. This protocol is comprised of five tasks, each designed to evaluate a key property that a synthetic user should exhibit: choosing which items to talk about, expressing binary preferences, expressing open-ended preferences, requesting recommendations, and giving feedback. Through evaluation of baseline simulators, we demonstrate these tasks effectively reveal deviations of language models from human behavior, and offer insights on how to reduce the deviations with model selection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Recommender Systems and Techniques
