A Universal Framework for Offline Serendipity Evaluation in Recommender Systems via Large Language Models

Yu Tokutake; Kazushi Okamoto; Kei Harada; Atsushi Shibata; Koki Karube

arXiv:2508.17571·cs.IR·August 26, 2025

A Universal Framework for Offline Serendipity Evaluation in Recommender Systems via Large Language Models

Yu Tokutake, Kazushi Okamoto, Kei Harada, Atsushi Shibata, Koki Karube

PDF

TL;DR

This paper introduces a universal offline evaluation framework for serendipity in recommender systems using large language models, addressing the challenge of unobservable ground truth and improving assessment accuracy.

Contribution

It proposes a novel LLM-based evaluation framework for serendipity that is generalizable across datasets and RS types, with optimized prompt strategies for accuracy.

Findings

01

LLMs with chain-of-thought prompts achieved highest prediction accuracy.

02

Serendipity-oriented RSs do not consistently outperform general RSs across datasets.

03

The framework enables evaluation without relying on ground truth annotations.

Abstract

Serendipity in recommender systems (RSs) has attracted increasing attention as a concept that enhances user satisfaction by presenting unexpected and useful items. However, evaluating serendipitous performance remains challenging because its ground truth is generally unobservable. The existing offline metrics often depend on ambiguous definitions or are tailored to specific datasets and RSs, thereby limiting their generalizability. To address this issue, we propose a universally applicable evaluation framework that leverages large language models (LLMs) known for their extensive knowledge and reasoning capabilities, as evaluators. First, to improve the evaluation performance of the proposed framework, we assessed the serendipity prediction accuracy of LLMs using four different prompt strategies on a dataset containing user-annotated serendipitous ground truth and found that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.