CoST: Contrastive Quantization based Semantic Tokenization for Generative Recommendation
Jieming Zhu, Mengqun Jin, Qijiong Liu, Zexuan Qiu, Zhenhua Dong, Xiu, Li

TL;DR
This paper introduces CoST, a contrastive quantization method for semantic tokenization in generative recommendation, significantly improving retrieval accuracy by better capturing item relationships.
Contribution
The paper presents a novel contrastive quantization approach for semantic tokenization that outperforms existing vector quantization methods in generative recommendation tasks.
Findings
Up to 43% improvement in Recall@5
Up to 44% improvement in NDCG@5
Effective capture of item neighborhood relationships
Abstract
Embedding-based retrieval serves as a dominant approach to candidate item matching for industrial recommender systems. With the success of generative AI, generative retrieval has recently emerged as a new retrieval paradigm for recommendation, which casts item retrieval as a generation problem. Its model consists of two stages: semantic tokenization and autoregressive generation. The first stage involves item tokenization that constructs discrete semantic tokens to index items, while the second stage autoregressively generates semantic tokens of candidate items. Therefore, semantic tokenization serves as a crucial preliminary step for training generative recommendation models. Existing research usually employs a vector quantizier with reconstruction loss (e.g., RQ-VAE) to obtain semantic tokens of items, but this method fails to capture the essential neighborhood relationships that are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques
