Towards Scalable Semantic Representation for Recommendation
Taolin Zhang, Junwei Pan, Jinpeng Wang, Yaohua Zha, Tao Dai, Bin Chen,, Ruisheng Luo, Xiaoxiang Deng, Yuan Wang, Ming Yue, Jie Jiang, Shu-Tao Xia

TL;DR
This paper introduces Mixture-of-Codes, a scalable semantic representation method using multiple codebooks for LLM embeddings, significantly improving recommendation performance by enhancing discriminability and robustness.
Contribution
The paper proposes a novel Mixture-of-Codes approach that scales up semantic representations for recommendation systems, addressing dimension reduction issues in LLM embeddings.
Findings
Achieves superior discriminability and robustness in semantic representations.
Demonstrates best scale-up performance in recommendation tasks.
Outperforms existing methods in experimental evaluations.
Abstract
With recent advances in large language models (LLMs), there has been emerging numbers of research in developing Semantic IDs based on LLMs to enhance the performance of recommendation systems. However, the dimension of these embeddings needs to match that of the ID embedding in recommendation, which is usually much smaller than the original length. Such dimension compression results in inevitable losses in discriminability and dimension robustness of the LLM embeddings, which motivates us to scale up the semantic representation. In this paper, we propose Mixture-of-Codes, which first constructs multiple independent codebooks for LLM representation in the indexing stage, and then utilizes the Semantic Representation along with a fusion module for the downstream recommendation stage. Extensive analysis and experiments demonstrate that our method achieves superior discriminability and…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. This paper discusses an interesting problem: how to scale up semantic representations on transferring knowledge from LLM to recsys, and provides a detailed analysis of the shortcomings of some traditional approaches. 2. The authors propose an effective algorithm (Mixture-of-Codes) to improve the scalability of discrete semantic codebooks, and the experimental results validate the effectiveness of the proposed method.
1. The writing of this paper needs to be improved. For example, Sections 4.1 and 4.2 use the same symbol N to represent two different variables, which seems inconsistent. Besides, there are many experiments and analyses through the paper, and the authors should pay more attention to the relationships between them. 2. The use of discrete semantic codebooks in this paper is innovative, but before the authors further discuss how to use them better, they should first elaborate on the advantages and
1. The paper is well-written and easy to understand. 2. The paper pioneers as study on the scalability of semantic representation on transferring knowledge from LLM to recommendation systems.
1. The current experimental design does not sufficiently validate the authors' claims a) The improvements are not significant. In table1, compared to the non-scaling version, the improvement is often only around 0.1%. b) The necessary baseline comparison using IDs to scale embeddings without using codebooks is missing. If there is no significant improvement compared to this baseline, then there is no need to explore the use of semantic IDs. c) Contradic to this paper's claim: MoC wil
1. Timely study on semantic IDs, semantic representations, and scaling law in recommender systems. 2. Experiments are conducted on three public datasets. 3. The idea of calculating mutual information as an indicator of discriminability is interesting.
1. Taking RQ-VAE-based semantic IDs for semantic representations is impractical. 1. RQ-VAE-based semantic IDs are structured to maintain sequential dependence across different levels, making them commonly used in generative retrieval or generative recommendation models. In these cases, an autoregressive model is trained to capture these semantic IDs in a sequence-to-sequence fashion. From my understanding, RQ-VAE is not well-suited for direct application as features or semantic representatio
1. Compared with the hierarchical design of RQ-VAE, MoC develops a parallel framework with multiple codebooks. 2. The study conducts intensive visualizations and experiments on three public datasets (Amazon Beauty, Sports, and Toys) and compares MoC with existing approaches.
1. The 3rd paragraph of Introduction section seems like the paraphrase of the 1st paragraph of section 3.1. $\textbf{Introduction: }$"Notably, the LLM embeddings usually have very large dimensions, ranging from 4,096 to 16,384 (Dubey et al., 2024). When generating the embeddings for these codes, their dimension needs to match that of the recommendation IDs. However, the dimension in recommendation is usually small due to the Interaction Collapse Theory (Guo et al., 2023)." $\textbf{Section 3
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Topic Modeling · Machine Learning in Healthcare
