SVP-CF: Selection via Proxy for Collaborative Filtering Data
Noveen Sachdeva, Carole-Jean Wu, Julian McAuley

TL;DR
This paper investigates how different dataset sampling strategies impact recommendation algorithm performance and introduces SVP-CF, a data-specific sampling method that better preserves model rankings, especially in long-tail data.
Contribution
The paper characterizes the effects of sampling on recommendation algorithms and proposes SVP-CF, a novel sampling strategy that maintains relative performance rankings.
Findings
SVP-CF outperforms common sampling methods in preserving algorithm rankings.
Sampling strategies significantly influence the perceived effectiveness of recommendation models.
SVP-CF is particularly effective for long-tail interaction datasets.
Abstract
We study the practical consequences of dataset sampling strategies on the performance of recommendation algorithms. Recommender systems are generally trained and evaluated on samples of larger datasets. Samples are often taken in a naive or ad-hoc fashion: e.g. by sampling a dataset randomly or by selecting users or items with many interactions. As we demonstrate, commonly-used data sampling schemes can have significant consequences on algorithm performance -- masking performance deficiencies in algorithms or altering the relative performance of algorithms, as compared to models trained on the complete dataset. Following this observation, this paper makes the following main contributions: (1) characterizing the effect of sampling on algorithm performance, in terms of algorithm and dataset characteristics (e.g. sparsity characteristics, sequential dynamics, etc.); and (2) designing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques
