TokenRec: Learning to Tokenize ID for LLM-based Generative Recommendation
Haohao Qu, Wenqi Fan, Zihuai Zhao, Qing Li

TL;DR
TokenRec introduces a novel tokenization and retrieval framework for LLM-based recommender systems, effectively capturing collaborative knowledge and reducing inference time, leading to improved recommendation performance.
Contribution
The paper proposes Masked Vector-Quantized tokenization and a generative retrieval paradigm, enhancing high-order knowledge incorporation and efficiency in LLM-based recommendation systems.
Findings
TokenRec outperforms traditional and LLM-based recommenders in experiments.
The MQ Tokenizer effectively captures collaborative knowledge.
Retrieval paradigm reduces inference time significantly.
Abstract
There is a growing interest in utilizing large-scale language models (LLMs) to advance next-generation Recommender Systems (RecSys), driven by their outstanding language understanding and in-context learning capabilities. In this scenario, tokenizing (i.e., indexing) users and items becomes essential for ensuring a seamless alignment of LLMs with recommendations. While several studies have made progress in representing users and items through textual contents or latent representations, challenges remain in efficiently capturing high-order collaborative knowledge into discrete tokens that are compatible with LLMs. Additionally, the majority of existing tokenization approaches often face difficulties in generalizing effectively to new/unseen users or items that were not in the training corpus. To address these challenges, we propose a novel framework called TokenRec, which introduces not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Digital Rights Management and Security · Topic Modeling
