TokenRec: Learning to Tokenize ID for LLM-based Generative Recommendation

Haohao Qu; Wenqi Fan; Zihuai Zhao; Qing Li

arXiv:2406.10450·cs.IR·August 18, 2025·3 cites

TokenRec: Learning to Tokenize ID for LLM-based Generative Recommendation

Haohao Qu, Wenqi Fan, Zihuai Zhao, Qing Li

PDF

Open Access

TL;DR

TokenRec introduces a novel tokenization and retrieval framework for LLM-based recommender systems, effectively capturing collaborative knowledge and reducing inference time, leading to improved recommendation performance.

Contribution

The paper proposes Masked Vector-Quantized tokenization and a generative retrieval paradigm, enhancing high-order knowledge incorporation and efficiency in LLM-based recommendation systems.

Findings

01

TokenRec outperforms traditional and LLM-based recommenders in experiments.

02

The MQ Tokenizer effectively captures collaborative knowledge.

03

Retrieval paradigm reduces inference time significantly.

Abstract

There is a growing interest in utilizing large-scale language models (LLMs) to advance next-generation Recommender Systems (RecSys), driven by their outstanding language understanding and in-context learning capabilities. In this scenario, tokenizing (i.e., indexing) users and items becomes essential for ensuring a seamless alignment of LLMs with recommendations. While several studies have made progress in representing users and items through textual contents or latent representations, challenges remain in efficiently capturing high-order collaborative knowledge into discrete tokens that are compatible with LLMs. Additionally, the majority of existing tokenization approaches often face difficulties in generalizing effectively to new/unseen users or items that were not in the training corpus. To address these challenges, we propose a novel framework called TokenRec, which introduces not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Digital Rights Management and Security · Topic Modeling