Large Language Models as Evaluators for Recommendation Explanations
Xiaoyu Zhang, Yishan Li, Jiayin Wang, Bowen Sun, Weizhi Ma, Peijie, Sun, Min Zhang

TL;DR
This paper explores using large language models like GPT-4 to evaluate the quality of recommendation explanations, aiming to provide a more accurate, reproducible, and cost-effective assessment method based on human-like judgment.
Contribution
It demonstrates that LLMs can serve as effective evaluators for recommendation explanations, with strategies to improve their accuracy and stability through ensemble methods and combined human evaluation.
Findings
LLMs can provide evaluations comparable to human judgments.
Ensemble of multiple LLMs enhances evaluation accuracy.
Combining human labels with LLM evaluations improves reliability.
Abstract
The explainability of recommender systems has attracted significant attention in academia and industry. Many efforts have been made for explainable recommendations, yet evaluating the quality of the explanations remains a challenging and unresolved issue. In recent years, leveraging LLMs as evaluators presents a promising avenue in Natural Language Processing tasks (e.g., sentiment classification, information extraction), as they perform strong capabilities in instruction following and common-sense reasoning. However, evaluating recommendation explanatory texts is different from these NLG tasks, as its criteria are related to human perceptions and are usually subjective. In this paper, we investigate whether LLMs can serve as evaluators of recommendation explanations. To answer the question, we utilize real user feedback on explanations given from previous work and additionally collect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
