SVP-CF: Selection via Proxy for Collaborative Filtering Data

Noveen Sachdeva; Carole-Jean Wu; Julian McAuley

arXiv:2107.04984·cs.IR·July 13, 2021·1 cites

SVP-CF: Selection via Proxy for Collaborative Filtering Data

Noveen Sachdeva, Carole-Jean Wu, Julian McAuley

PDF

Open Access

TL;DR

This paper investigates how different dataset sampling strategies impact recommendation algorithm performance and introduces SVP-CF, a data-specific sampling method that better preserves model rankings, especially in long-tail data.

Contribution

The paper characterizes the effects of sampling on recommendation algorithms and proposes SVP-CF, a novel sampling strategy that maintains relative performance rankings.

Findings

01

SVP-CF outperforms common sampling methods in preserving algorithm rankings.

02

Sampling strategies significantly influence the perceived effectiveness of recommendation models.

03

SVP-CF is particularly effective for long-tail interaction datasets.

Abstract

We study the practical consequences of dataset sampling strategies on the performance of recommendation algorithms. Recommender systems are generally trained and evaluated on samples of larger datasets. Samples are often taken in a naive or ad-hoc fashion: e.g. by sampling a dataset randomly or by selecting users or items with many interactions. As we demonstrate, commonly-used data sampling schemes can have significant consequences on algorithm performance -- masking performance deficiencies in algorithms or altering the relative performance of algorithms, as compared to models trained on the complete dataset. Following this observation, this paper makes the following main contributions: (1) characterizing the effect of sampling on algorithm performance, in terms of algorithm and dataset characteristics (e.g. sparsity characteristics, sequential dynamics, etc.); and (2) designing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques