Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations
Anima Singh, Trung Vu, Nikhil Mehta, Raghunandan Keshavan, Maheswaran, Sathiamoorthy, Yilin Zheng, Lichan Hong, Lukasz Heldt, Li Wei, Devansh, Tandon, Ed H. Chi, Xinyang Yi

TL;DR
This paper introduces Semantic IDs, a content-derived item representation learned via RQ-VAE, which improves recommendation model generalization to unseen and long-tail items without sacrificing overall quality.
Contribution
It proposes Semantic IDs as a novel discrete item representation that balances memorization and generalization, along with industry-scale adaptation methods using hashing and SentencePiece.
Findings
Semantic IDs outperform random IDs in generalization to new items.
Using SentencePiece enhances adaptation of Semantic IDs in ranking models.
Semantic IDs maintain or improve overall recommendation quality.
Abstract
Randomly-hashed item ids are used ubiquitously in recommendation models. However, the learned representations from random hashing prevents generalization across similar items, causing problems of learning unseen and long-tail items, especially when item corpus is large, power-law distributed, and evolving dynamically. In this paper, we propose using content-derived features as a replacement for random ids. We show that simply replacing ID features with content-based embeddings can cause a drop in quality due to reduced memorization capability. To strike a good balance of memorization and generalization, we propose to use Semantic IDs -- a compact discrete item representation learned from frozen content embeddings using RQ-VAE that captures the hierarchy of concepts in items -- as a replacement for random item ids. Similar to content embeddings, the compactness of Semantic IDs poses a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Text and Document Classification Technologies · Topic Modeling
MethodsByte Pair Encoding · SentencePiece
