Where Do We Go From Here? Guidelines For Offline Recommender Evaluation
Tobias Schnabel

TL;DR
This paper identifies key issues in offline recommender system evaluation, proposes guidelines and a toolkit called TrainRec for standardized experimentation, and demonstrates its effectiveness through extensive baseline testing.
Contribution
It introduces practical guidelines for offline recommender evaluation, and presents TrainRec, a flexible toolkit that implements these guidelines for more reliable experimentation.
Findings
Many results on small datasets are not statistically significant.
At least three baselines perform well across most datasets.
Enhanced uncertainty quantification can invalidate some reported method differences.
Abstract
Various studies in recent years have pointed out large issues in the offline evaluation of recommender systems, making it difficult to assess whether true progress has been made. However, there has been little research into what set of practices should serve as a starting point during experimentation. In this paper, we examine four larger issues in recommender system research regarding uncertainty estimation, generalization, hyperparameter optimization and dataset pre-processing in more detail to arrive at a set of guidelines. We present a TrainRec, a lightweight and flexible toolkit for offline training and evaluation of recommender systems that implements these guidelines. Different from other frameworks, TrainRec is a toolkit that focuses on experimentation alone, offering flexible modules that can be can be used together or in isolation. Finally, we demonstrate TrainRec's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Data Classification · Data Stream Mining Techniques
