Efficient Content-based Recommendation Model Training via Noise-aware Coreset Selection
Hung Vinh Tran, Tong Chen, Hechuan Wen, Quoc Viet Hung Nguyen, Bin Cui, Hongzhi Yin

TL;DR
This paper introduces NaCS, a noise-aware coreset selection framework for content-based recommendation systems that reduces training data size significantly while maintaining high model performance.
Contribution
NaCS is a novel framework that constructs high-quality, noise-aware coresets for CRSs using gradient-based optimization and label correction, improving efficiency and robustness.
Findings
NaCS achieves 93-95% of full dataset performance with only 1% of data.
NaCS outperforms existing coreset selection methods in recommendation tasks.
NaCS effectively filters noisy and low-confidence samples, enhancing model quality.
Abstract
Content-based recommendation systems (CRSs) utilize content features to predict user-item interactions, serving as essential tools for helping users navigate information-rich web services. However, ensuring the effectiveness of CRSs requires large-scale and even continuous model training to accommodate diverse user preferences, resulting in significant computational costs and resource demands. A promising approach to this challenge is coreset selection, which identifies a small but representative subset of data samples that preserves model quality while reducing training overhead. Yet, the selected coreset is vulnerable to the pervasive noise in user-item interactions, particularly when it is minimally sized. To this end, we propose Noise-aware Coreset Selection (NaCS), a specialized framework for CRSs. NaCS constructs coresets through submodular optimization based on training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Technologies in Various Fields · Domain Adaptation and Few-Shot Learning
