On Sampling Top-K Recommendation Evaluation
Dong Li, Ruoming Jin, Jing Gao, Zhi Liu

TL;DR
This paper investigates the relationship between sampling-based and global top-$K$ recommendation metrics, demonstrating that sampling metrics can reliably approximate and predict the true top-$K$ performance.
Contribution
It provides a theoretical and empirical analysis showing sampling top-$k$ metrics accurately reflect global top-$K$ metrics and can predict recommendation winners.
Findings
Sampling top-$k$ Hit-Ratio closely approximates global top-$K$ Hit-Ratio.
Sampling metrics can reliably predict the best-performing recommendation algorithms.
Theoretical and experimental validation supports the use of sampling metrics in evaluation.
Abstract
Recently, Rendle has warned that the use of sampling-based top- metrics might not suffice. This throws a number of recent studies on deep learning-based recommendation algorithms, and classic non-deep-learning algorithms using such a metric, into jeopardy. In this work, we thoroughly investigate the relationship between the sampling and global top- Hit-Ratio (HR, or Recall), originally proposed by Koren[2] and extensively used by others. By formulating the problem of aligning sampling top- () and global top- () Hit-Ratios through a mapping function , so that , we demonstrate both theoretically and experimentally that the sampling top- Hit-Ratio provides an accurate approximation of its global (exact) counterpart, and can consistently predict the correct winners (the same as indicate by their corresponding global Hit-Ratios).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
