Evaluating Conversational Recommender Systems via Large Language Models: A User-Centric Framework
Nuo Chen, Quanyu Dai, Xiaoyu Dong, Piaohong Wang, Qinglin Jia, Zhaocheng Du, Zhenhua Dong, Xiao-Ming Wu

TL;DR
This paper introduces CoRE, a user-centric evaluation framework for conversational recommender systems that uses large language models to assess multiple user experience factors and synthesize an overall performance score, aligning well with human judgments.
Contribution
The paper presents a novel LLM-based evaluation framework, CoRE, that assesses multiple user experience factors and combines them into an overall score through a multi-agent debate, improving alignment with human evaluation.
Findings
CoRE's scores align closely with human evaluations on key factors.
CoRE outperforms existing rule-based metrics in reflecting user experience.
The framework effectively evaluates four CRSs on benchmark datasets.
Abstract
Conversational recommender systems (CRSs) integrate both recommendation and dialogue tasks, making their evaluation uniquely challenging. Existing approaches primarily assess CRS performance by separately evaluating item recommendation and dialogue management using rule-based metrics. However, these methods fail to capture the real human experience, and they cannot draw direct conclusions about the system's overall performance. As conversational recommender systems become increasingly vital in e-commerce, social media, and customer support, the ability to evaluate both recommendation accuracy and dialogue management quality using a single metric, thereby authentically reflecting user experience, has become the principal challenge impeding progress in this field. In this work, we propose a user-centric evaluation framework based on large language models (LLMs) for CRSs, namely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Topic Modeling · Advanced Text Analysis Techniques
MethodsALIGN
