Exploring the Potential of LLMs for Serendipity Evaluation in Recommender Systems
Li Kang, Yuhan Zhao, Li Chen

TL;DR
This paper investigates whether large language models can effectively simulate human judgment for evaluating serendipity in recommender systems, potentially offering a more accurate and cost-effective assessment method.
Contribution
It demonstrates that LLMs, especially with multi-model techniques and auxiliary data, can match or outperform traditional proxy metrics in serendipity evaluation.
Findings
Zero-shot LLMs achieve comparable or better performance than traditional metrics.
Multi-LLM techniques and auxiliary data improve alignment with human judgments.
Optimal LLM evaluation yields a 21.5% Pearson correlation with user studies.
Abstract
Serendipity plays a pivotal role in enhancing user satisfaction within recommender systems, yet its evaluation poses significant challenges due to its inherently subjective nature and conceptual ambiguity. Current algorithmic approaches predominantly rely on proxy metrics for indirect assessment, often failing to align with real user perceptions, thus creating a gap. With large language models (LLMs) increasingly revolutionizing evaluation methodologies across various human annotation tasks, we are inspired to explore a core research proposition: Can LLMs effectively simulate human users for serendipity evaluation? To address this question, we conduct a meta-evaluation on two datasets derived from real user studies in the e-commerce and movie domains, focusing on three key aspects: the accuracy of LLMs compared to conventional proxy metrics, the influence of auxiliary data on LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Text and Document Classification Technologies · Advanced Text Analysis Techniques
