RecSys Arena: Pair-wise Recommender System Evaluation with Large   Language Models

Zhuo Wu; Qinglin Jia; Chuhan Wu; Zhaocheng Du; Shuai Wang; Zan Wang,; Zhenhua Dong

arXiv:2412.11068·cs.IR·December 17, 2024

RecSys Arena: Pair-wise Recommender System Evaluation with Large Language Models

Zhuo Wu, Qinglin Jia, Chuhan Wu, Zhaocheng Du, Shuai Wang, Zan Wang,, Zhenhua Dong

PDF

Open Access 1 Repo

TL;DR

This paper introduces RecSys Arena, a novel evaluation framework using large language models to simulate user feedback, providing more nuanced and consistent assessments of recommender systems beyond traditional offline metrics.

Contribution

It proposes leveraging LLMs as simulated users to evaluate recommendation algorithms, capturing subjective preferences and subtle differences more effectively than conventional metrics.

Findings

01

LLMs' evaluations align well with offline metrics.

02

LLMs provide richer, subjective evaluation insights.

03

The method better distinguishes similar algorithms.

Abstract

Evaluating the quality of recommender systems is critical for algorithm design and optimization. Most evaluation methods are computed based on offline metrics for quick algorithm evolution, since online experiments are usually risky and time-consuming. However, offline evaluation usually cannot fully reflect users' preference for the outcome of different recommendation algorithms, and the results may not be consistent with online A/B test. Moreover, many offline metrics such as AUC do not offer sufficient information for comparing the subtle differences between two competitive recommender systems in different aspects, which may lead to substantial performance differences in long-term online serving. Fortunately, due to the strong commonsense knowledge and role-play capability of large language models (LLMs), it is possible to obtain simulated user feedback on offline recommendation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anonyprojects/recsys-arena
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Recommender Systems and Techniques · Natural Language Processing Techniques