Large Language Models as Evaluators for Recommendation Explanations

Xiaoyu Zhang; Yishan Li; Jiayin Wang; Bowen Sun; Weizhi Ma; Peijie; Sun; Min Zhang

arXiv:2406.03248·cs.IR·June 7, 2024

Large Language Models as Evaluators for Recommendation Explanations

Xiaoyu Zhang, Yishan Li, Jiayin Wang, Bowen Sun, Weizhi Ma, Peijie, Sun, Min Zhang

PDF

Open Access 1 Repo

TL;DR

This paper explores using large language models like GPT-4 to evaluate the quality of recommendation explanations, aiming to provide a more accurate, reproducible, and cost-effective assessment method based on human-like judgment.

Contribution

It demonstrates that LLMs can serve as effective evaluators for recommendation explanations, with strategies to improve their accuracy and stability through ensemble methods and combined human evaluation.

Findings

01

LLMs can provide evaluations comparable to human judgments.

02

Ensemble of multiple LLMs enhances evaluation accuracy.

03

Combining human labels with LLM evaluations improves reliability.

Abstract

The explainability of recommender systems has attracted significant attention in academia and industry. Many efforts have been made for explainable recommendations, yet evaluating the quality of the explanations remains a challenging and unresolved issue. In recent years, leveraging LLMs as evaluators presents a promising avenue in Natural Language Processing tasks (e.g., sentiment classification, information extraction), as they perform strong capabilities in instruction following and common-sense reasoning. However, evaluating recommendation explanatory texts is different from these NLG tasks, as its criteria are related to human perceptions and are usually subjective. In this paper, we investigate whether LLMs can serve as evaluators of recommendation explanations. To answer the question, we utilize real user feedback on explanations given from previous work and additionally collect…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiaoyu-sz/llmasevaluator
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling