Taming the Long Tail: Denoising Collaborative Information for Robust Semantic ID Generation
Yi Xu, Moyu Zhang, Chaofan Fan, Jinxin Hu, Xiaochen Li, Yu Zhang, Xiaoyi Zeng, Jing Zhang

TL;DR
This paper introduces ADC-SID, a framework that adaptively denoises collaborative information to improve semantic ID generation in recommender systems, especially addressing long-tail item challenges.
Contribution
It proposes novel adaptive alignment and dynamic weighting mechanisms to reduce collaborative noise impact on semantic IDs, enhancing robustness and expressiveness.
Findings
ADC-SID outperforms existing methods in experiments.
Improves long-tail item representation stability.
Reduces noise influence in semantic ID generation.
Abstract
Item IDs form the backbone of industrial recommender systems, but suffer from representation instability and poor long-tail generalization in large, dynamic item corpora. Semantic IDs (SIDs) mitigate these issues by enabling knowledge sharing through quantization of item content features. Existing methods attempt to enhance SID expressiveness by incorporating collaborative information with content features; however, they often overlook a critical distinction: unlike relatively uniform content features, user-item interactions are highly skewed, resulting in a significant quality gap in collaborative information between popular and long-tail items. This mismatch leads to two critical limitations: (1) Collaborative Noise Corrupts Behavior-Content Alignment: Behavior-content alignment is a prevailing approach for modeling shared information. However, indiscriminate alignment allows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
