Beyond Utility: Evaluating LLM as Recommender

Chumeng Jiang; Jiayin Wang; Weizhi Ma; Charles L. A. Clarke; Shuai; Wang; Chuhan Wu; Min Zhang

arXiv:2411.00331·cs.IR·November 4, 2024

Beyond Utility: Evaluating LLM as Recommender

Chumeng Jiang, Jiayin Wang, Weizhi Ma, Charles L. A. Clarke, Shuai, Wang, Chuhan Wu, Min Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multidimensional evaluation framework for LLM-based recommenders, highlighting new aspects like hallucinations and bias, and compares seven models across multiple datasets and strategies.

Contribution

It proposes a novel multidimensional evaluation framework for LLM recommenders, addressing aspects beyond utility, and provides comprehensive analysis of seven models using this framework.

Findings

01

LLMs excel with prior knowledge and short histories in ranking tasks.

02

LLMs outperform traditional models in re-ranking settings.

03

Candidate position bias and hallucinations are significant issues in LLM recommenders.

Abstract

With the rapid development of Large Language Models (LLMs), recent studies employed LLMs as recommenders to provide personalized information services for distinct users. Despite efforts to improve the accuracy of LLM-based recommendation models, relatively little attention is paid to beyond-utility dimensions. Moreover, there are unique evaluation aspects of LLM-based recommendation models, which have been largely ignored. To bridge this gap, we explore four new evaluation dimensions and propose a multidimensional evaluation framework. The new evaluation dimensions include: 1) history length sensitivity, 2) candidate position bias, 3) generation-involved performance, and 4) hallucinations. All four dimensions have the potential to impact performance, but are largely unnecessary for consideration in traditional systems. Using this multidimensional evaluation framework, along with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiangdeccc/evallmasrecommender
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLibrary Science and Information Systems · Artificial Intelligence in Law

MethodsSoftmax · Attention Is All You Need