A Case Study on Sampling Strategies for Evaluating Neural Sequential Item Recommendation Models
Alexander Dallmann, Daniel Zoller, Andreas Hotho

TL;DR
This study investigates how different sampling strategies, uniform random and popularity-based, affect the evaluation and ranking of neural sequential item recommendation models across multiple datasets.
Contribution
It provides a comprehensive comparison of sampling strategies' impact on model rankings, highlighting inconsistencies with full ranking evaluations.
Findings
Both sampling strategies can lead to different model rankings compared to full evaluation.
Sampling by popularity and uniform random sampling do not always produce consistent rankings.
Sampling strategy choice significantly influences the perceived performance of recommendation models.
Abstract
At the present time, sequential item recommendation models are compared by calculating metrics on a small item subset (target set) to speed up computation. The target set contains the relevant item and a set of negative items that are sampled from the full item set. Two well-known strategies to sample negative items are uniform random sampling and sampling by popularity to better approximate the item frequency distribution in the dataset. Most recently published papers on sequential item recommendation rely on sampling by popularity to compare the evaluated models. However, recent work has already shown that an evaluation with uniform random sampling may not be consistent with the full ranking, that is, the model ranking obtained by evaluating a metric using the full item set as target set, which raises the question whether the ranking obtained by sampling by popularity is equal to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
