Beyond Utility: Evaluating LLM as Recommender
Chumeng Jiang, Jiayin Wang, Weizhi Ma, Charles L. A. Clarke, Shuai, Wang, Chuhan Wu, Min Zhang

TL;DR
This paper introduces a multidimensional evaluation framework for LLM-based recommenders, highlighting new aspects like hallucinations and bias, and compares seven models across multiple datasets and strategies.
Contribution
It proposes a novel multidimensional evaluation framework for LLM recommenders, addressing aspects beyond utility, and provides comprehensive analysis of seven models using this framework.
Findings
LLMs excel with prior knowledge and short histories in ranking tasks.
LLMs outperform traditional models in re-ranking settings.
Candidate position bias and hallucinations are significant issues in LLM recommenders.
Abstract
With the rapid development of Large Language Models (LLMs), recent studies employed LLMs as recommenders to provide personalized information services for distinct users. Despite efforts to improve the accuracy of LLM-based recommendation models, relatively little attention is paid to beyond-utility dimensions. Moreover, there are unique evaluation aspects of LLM-based recommendation models, which have been largely ignored. To bridge this gap, we explore four new evaluation dimensions and propose a multidimensional evaluation framework. The new evaluation dimensions include: 1) history length sensitivity, 2) candidate position bias, 3) generation-involved performance, and 4) hallucinations. All four dimensions have the potential to impact performance, but are largely unnecessary for consideration in traditional systems. Using this multidimensional evaluation framework, along with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLibrary Science and Information Systems · Artificial Intelligence in Law
MethodsSoftmax · Attention Is All You Need
