Recommendation Is a Dish Better Served Warm
Danil Gusak, Nikita Sukhorukov, Evgeny Frolov

TL;DR
This paper investigates how arbitrary cold-start thresholds in recommender systems impact evaluation consistency, showing that inconsistent criteria can either discard useful data or misclassify instances, affecting system performance.
Contribution
It systematically analyzes the effects of cold-start threshold choices on recommendation evaluation across multiple datasets and models, highlighting the importance of standardized criteria.
Findings
Inconsistent cold-start thresholds can remove valuable data.
Threshold variations lead to misclassification of cold and warm instances.
Standardizing thresholds improves evaluation reliability.
Abstract
In modern recommender systems, experimental settings typically include filtering out cold users and items based on a minimum interaction threshold. However, these thresholds are often chosen arbitrarily and vary widely across studies, leading to inconsistencies that can significantly affect the comparability and reliability of evaluation results. In this paper, we systematically explore the cold-start boundary by examining the criteria used to determine whether a user or an item should be considered cold. Our experiments incrementally vary the number of interactions for different items during training, and gradually update the length of user interaction histories during inference. We investigate the thresholds across several widely used datasets, commonly represented in recent papers from top-tier conferences, and on multiple established recommender baselines. Our findings show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
