On the Difficulty of Evaluating Baselines: A Study on Recommender Systems
Steffen Rendle, Li Zhang, Yehuda Koren

TL;DR
This paper highlights the challenges in properly evaluating baselines in recommender systems, showing that suboptimal baseline results are common and emphasizing the need for standardized, well-tuned benchmarks for reliable comparisons.
Contribution
The study demonstrates that careful tuning of baselines can significantly improve results, questioning the validity of many published findings in recommender system research.
Findings
Suboptimal baseline results are prevalent in literature.
Proper tuning can outperform recent proposed methods.
Standardized benchmarks are essential for reliable evaluations.
Abstract
Numerical evaluations with comparisons to baselines play a central role when judging research in recommender systems. In this paper, we show that running baselines properly is difficult. We demonstrate this issue on two extensively studied datasets. First, we show that results for baselines that have been used in numerous publications over the past five years for the Movielens 10M benchmark are suboptimal. With a careful setup of a vanilla matrix factorization baseline, we are not only able to improve upon the reported results for this baseline but even outperform the reported results of any newly proposed method. Secondly, we recap the tremendous effort that was required by the community to obtain high quality results for simple methods on the Netflix Prize. Our results indicate that empirical findings in research papers are questionable unless they were obtained on standardized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques
