Evaluating Large Language Models as Generative User Simulators for   Conversational Recommendation

Se-eun Yoon; Zhankui He; Jessica Maria Echterhoff; Julian McAuley

arXiv:2403.09738·cs.CL·March 27, 2024·1 cites

Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation

Se-eun Yoon, Zhankui He, Jessica Maria Echterhoff, Julian McAuley

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper assesses the potential of large language models to serve as synthetic user simulators in conversational recommendation systems, introducing a protocol to evaluate their human-like behavior across five key tasks.

Contribution

It presents a new evaluation protocol for measuring how well language models emulate human user behavior in conversational recommendation scenarios.

Findings

01

Language models show promise but deviate from human behavior in key tasks.

02

Evaluation tasks reveal specific areas where models can be improved.

03

Prompting strategies can reduce deviations from human-like responses.

Abstract

Synthetic users are cost-effective proxies for real users in the evaluation of conversational recommender systems. Large language models show promise in simulating human-like behavior, raising the question of their ability to represent a diverse population of users. We introduce a new protocol to measure the degree to which language models can accurately emulate human behavior in conversational recommendation. This protocol is comprised of five tasks, each designed to evaluate a key property that a synthetic user should exhibit: choosing which items to talk about, expressing binary preferences, expressing open-ended preferences, requesting recommendations, and giving feedback. Through evaluation of baseline simulators, we demonstrate these tasks effectively reveal deviations of language models from human behavior, and offer insights on how to reduce the deviations with model selection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

granelle/naacl24-user-sim
noneOfficial

Videos

Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation· underline

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Recommender Systems and Techniques