TL;DR
This paper critically examines the standard popularity baseline in recommender systems, revealing its limitations and proposing a more temporally accurate evaluation method that significantly improves performance metrics.
Contribution
It highlights the flaws in current popularity baseline evaluations and introduces a revised approach considering temporal factors, enhancing baseline effectiveness.
Findings
Popularity performance can improve by over 70% when considering temporal context.
Users with lower movie tendencies tend to follow popular trends, while avid raters focus on personal preferences.
Current popularity baselines may recommend items released after user interactions, skewing evaluation results.
Abstract
Popularity is often included in experimental evaluation to provide a reference performance for a recommendation task. To understand how popularity baseline is defined and evaluated, we sample 12 papers from top-tier conferences including KDD, WWW, SIGIR, and RecSys, and 6 open source toolkits. We note that the widely adopted MostPop baseline simply ranks items based on the number of interactions in the training data. We argue that the current evaluation of popularity (i) does not reflect the popular items at the time when a user interacts with the system, and (ii) may recommend items released after a user's last interaction with the system. On the widely used MovieLens dataset, we show that the performance of popularity could be significantly improved by 70% or more, if we consider the popular items at the time point when a user interacts with the system. We further show that, on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
