Variable-Length Semantic IDs for Recommender Systems
Kirill Khrylchenko

TL;DR
This paper introduces variable-length semantic identifiers for recommender systems, leveraging a probabilistic autoencoder to better represent items of varying importance and address the limitations of fixed-length identifiers.
Contribution
It proposes a novel variable-length semantic ID method using a discrete variational autoencoder, bridging recommender systems and emergent communication.
Findings
Improved representation of items with adaptive length.
Enhanced modeling of long-tail and popular items.
Avoids instability of REINFORCE training methods.
Abstract
Generative models are increasingly used in recommender systems, both for modeling user behavior as event sequences and for integrating large language models into recommendation pipelines. A key challenge in this setting is the extremely large cardinality of item spaces, which makes training generative models difficult and introduces a vocabulary gap between natural language and item identifiers. Semantic identifiers (semantic IDs), which represent items as sequences of low-cardinality tokens, have recently emerged as an effective solution to this problem. However, existing approaches generate semantic identifiers of fixed length, assigning the same description length to all items. This is inefficient, misaligned with natural language, and ignores the highly skewed frequency structure of real-world catalogs, where popular items and rare long-tail items exhibit fundamentally different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Topic Modeling · Generative Adversarial Networks and Image Synthesis
