On the Reliability of Sampling Strategies in Offline Recommender Evaluation

Bruno L. Pereira; Alan Said; Rodrygo L. T. Santos

arXiv:2508.05398·cs.IR·August 12, 2025

On the Reliability of Sampling Strategies in Offline Recommender Evaluation

Bruno L. Pereira, Alan Said, Rodrygo L. T. Santos

PDF

TL;DR

This paper examines how different sampling strategies impact the reliability of offline recommender system evaluation, providing insights and guidance to improve evaluation fidelity under exposure biases.

Contribution

It systematically analyzes the effects of logging and sampling choices on offline evaluation reliability using a fully observed dataset as ground truth.

Findings

01

Sampling strategies vary in their ability to distinguish between models.

02

Certain sampling methods maintain higher fidelity and robustness.

03

Guidelines are provided for selecting effective sampling strategies.

Abstract

Offline evaluation plays a central role in benchmarking recommender systems when online testing is impractical or risky. However, it is susceptible to two key sources of bias: exposure bias, where users only interact with items they are shown, and sampling bias, introduced when evaluation is performed on a subset of logged items rather than the full catalog. While prior work has proposed methods to mitigate sampling bias, these are typically assessed on fixed logged datasets rather than for their ability to support reliable model comparisons under varying exposure conditions or relative to true user preferences. In this paper, we investigate how different combinations of logging and sampling choices affect the reliability of offline evaluation. Using a fully observed dataset as ground truth, we systematically simulate diverse exposure biases and assess the reliability of common sampling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.