TL;DR
This paper investigates how optimizing different IR metrics affects ranking-based recommender systems, revealing that RBP-inspired loss functions often outperform traditional metrics and challenge the common practice of matching optimization and evaluation metrics.
Contribution
It introduces novel RBP-inspired loss functions and provides extensive experimental evidence showing their advantages over traditional metrics in recommendation tasks.
Findings
RBP-inspired losses perform at least as well as traditional metrics.
Optimizing the same metric for evaluation and training is not always optimal.
User activity level influences the benefit gained from RBP-based optimization.
Abstract
Direct optimization of IR metrics has often been adopted as an approach to devise and develop ranking-based recommender systems. Most methods following this approach aim at optimizing the same metric being used for evaluation, under the assumption that this will lead to the best performance. A number of studies of this practice bring this assumption, however, into question. In this paper, we dig deeper into this issue in order to learn more about the effects of the choice of the metric to optimize on the performance of a ranking-based recommender system. We present an extensive experimental study conducted on different datasets in both pairwise and listwise learning-to-rank scenarios, to compare the relative merit of four popular IR metrics, namely RR, AP, nDCG and RBP, when used for optimization and assessment of recommender systems in various combinations. For the first three, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
