Evaluating Conversational Recommender Systems via User Simulation

Shuo Zhang; Krisztian Balog

arXiv:2006.08732·cs.IR·June 17, 2020

Evaluating Conversational Recommender Systems via User Simulation

Shuo Zhang, Krisztian Balog

PDF

1 Repo

TL;DR

This paper introduces a user simulation approach for evaluating conversational recommender systems, aiming to replace costly human evaluations with automated, realistic simulations that correlate well with human judgments.

Contribution

It presents a novel user simulation method that models individual preferences and interaction flow, improving automatic evaluation accuracy for conversational recommenders.

Findings

01

Preference modeling enhances simulation realism.

02

Task-specific interaction models improve evaluation quality.

03

Automatic measures correlate strongly with human assessments.

Abstract

Conversational information access is an emerging research area. Currently, human evaluation is used for end-to-end system evaluation, which is both very time and resource intensive at scale, and thus becomes a bottleneck of progress. As an alternative, we propose automated evaluation by means of simulating users. Our user simulator aims to generate responses that a real human would give by considering both individual preferences and the general flow of interaction with the system. We evaluate our simulation approach on an item recommendation task by comparing three existing conversational recommender systems. We show that preference modeling and task-specific interaction models both contribute to more realistic simulations, and can help achieve high correlation between automatic evaluation measures and manual human assessments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iai-group/kdd2020-usersim
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.