Evaluating Conversational Recommender Systems via Large Language Models: A User-Centric Framework

Nuo Chen; Quanyu Dai; Xiaoyu Dong; Piaohong Wang; Qinglin Jia; Zhaocheng Du; Zhenhua Dong; Xiao-Ming Wu

arXiv:2501.09493·cs.IR·January 27, 2026·2 cites

Evaluating Conversational Recommender Systems via Large Language Models: A User-Centric Framework

Nuo Chen, Quanyu Dai, Xiaoyu Dong, Piaohong Wang, Qinglin Jia, Zhaocheng Du, Zhenhua Dong, Xiao-Ming Wu

PDF

Open Access

TL;DR

This paper introduces CoRE, a user-centric evaluation framework for conversational recommender systems that uses large language models to assess multiple user experience factors and synthesize an overall performance score, aligning well with human judgments.

Contribution

The paper presents a novel LLM-based evaluation framework, CoRE, that assesses multiple user experience factors and combines them into an overall score through a multi-agent debate, improving alignment with human evaluation.

Findings

01

CoRE's scores align closely with human evaluations on key factors.

02

CoRE outperforms existing rule-based metrics in reflecting user experience.

03

The framework effectively evaluates four CRSs on benchmark datasets.

Abstract

Conversational recommender systems (CRSs) integrate both recommendation and dialogue tasks, making their evaluation uniquely challenging. Existing approaches primarily assess CRS performance by separately evaluating item recommendation and dialogue management using rule-based metrics. However, these methods fail to capture the real human experience, and they cannot draw direct conclusions about the system's overall performance. As conversational recommender systems become increasingly vital in e-commerce, social media, and customer support, the ability to evaluate both recommendation accuracy and dialogue management quality using a single metric, thereby authentically reflecting user experience, has become the principal challenge impeding progress in this field. In this work, we propose a user-centric evaluation framework based on large language models (LLMs) for CRSs, namely…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Topic Modeling · Advanced Text Analysis Techniques

MethodsALIGN