A Universal Framework for Offline Serendipity Evaluation in Recommender Systems via Large Language Models
Yu Tokutake, Kazushi Okamoto, Kei Harada, Atsushi Shibata, Koki Karube

TL;DR
This paper introduces a universal offline evaluation framework for serendipity in recommender systems using large language models, addressing the challenge of unobservable ground truth and improving assessment accuracy.
Contribution
It proposes a novel LLM-based evaluation framework for serendipity that is generalizable across datasets and RS types, with optimized prompt strategies for accuracy.
Findings
LLMs with chain-of-thought prompts achieved highest prediction accuracy.
Serendipity-oriented RSs do not consistently outperform general RSs across datasets.
The framework enables evaluation without relying on ground truth annotations.
Abstract
Serendipity in recommender systems (RSs) has attracted increasing attention as a concept that enhances user satisfaction by presenting unexpected and useful items. However, evaluating serendipitous performance remains challenging because its ground truth is generally unobservable. The existing offline metrics often depend on ambiguous definitions or are tailored to specific datasets and RSs, thereby limiting their generalizability. To address this issue, we propose a universally applicable evaluation framework that leverages large language models (LLMs) known for their extensive knowledge and reasoning capabilities, as evaluators. First, to improve the evaluation performance of the proposed framework, we assessed the serendipity prediction accuracy of LLMs using four different prompt strategies on a dataset containing user-annotated serendipitous ground truth and found that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
