TL;DR
This paper critically examines data leakage in offline recommender system evaluation, demonstrating its impact on model performance and proposing a timeline-based evaluation scheme for more realistic assessments.
Contribution
It provides a comprehensive analysis of data leakage causes, effects on model evaluation, and introduces a timeline scheme to improve offline recommendation assessment.
Findings
Data leakage causes models to recommend future items not available at test time.
Data leakage affects the accuracy and relative performance of recommender models.
A timeline scheme for evaluation offers a more realistic assessment of recommender systems.
Abstract
Recommender models are hard to evaluate, particularly under offline setting. In this paper, we provide a comprehensive and critical analysis of the data leakage issue in recommender system offline evaluation. Data leakage is caused by not observing global timeline in evaluating recommenders, e.g., train/test data split does not follow global timeline. As a result, a model learns from the user-item interactions that are not expected to be available at prediction time. We first show the temporal dynamics of user-item interactions along global timeline, then explain why data leakage exists for collaborative filtering models. Through carefully designed experiments, we show that all models indeed recommend future items that are not available at the time point of a test instance, as the result of data leakage. The experiments are conducted with four widely used baseline models - BPR, NeuMF,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
