Rethinking Generative Recommender Tokenizer: Recsys-Native Encoding and Semantic Quantization Beyond LLMs

Yu Liang; Zhongjin Zhang; Yuxuan Zhu; Kerui Zhang; Zhiluohan Guo; Wenhang Zhou; Zonqi Yang; Kangle Wu; Yabo Ni; Anxiang Zeng; Cong Fu; Jianxin Wang; Jiazhi Xia

arXiv:2602.02338·cs.IR·February 3, 2026

Rethinking Generative Recommender Tokenizer: Recsys-Native Encoding and Semantic Quantization Beyond LLMs

Yu Liang, Zhongjin Zhang, Yuxuan Zhu, Kerui Zhang, Zhiluohan Guo, Wenhang Zhou, Zonqi Yang, Kangle Wu, Yabo Ni, Anxiang Zeng, Cong Fu, Jianxin Wang, Jiazhi Xia

PDF

Open Access 1 Datasets

TL;DR

ReSID introduces a recommendation-native framework for semantic ID encoding that improves predictive accuracy and efficiency in sequential recommender systems without relying on large language models.

Contribution

ReSID proposes a novel SID framework with two components: FAMAE for learning predictive item representations and GAOQ for efficient, predictable quantization, addressing limitations of existing methods.

Findings

01

Outperforms existing baselines by over 10% on average.

02

Reduces tokenization cost by up to 122 times.

03

Effective across ten diverse datasets.

Abstract

Semantic ID (SID)-based recommendation is a promising paradigm for scaling sequential recommender systems, but existing methods largely follow a semantic-centric pipeline: item embeddings are learned from foundation models and discretized using generic quantization schemes. This design is misaligned with generative recommendation objectives: semantic embeddings are weakly coupled with collaborative prediction, and generic quantization is inefficient at reducing sequential uncertainty for autoregressive modeling. To address these, we propose ReSID, a recommendation-native, principled SID framework that rethinks representation learning and quantization from the perspective of information preservation and sequential predictability, without relying on LLMs. ReSID consists of two components: (i) Field-Aware Masked Auto-Encoding (FAMAE), which learns predictive-sufficient item representations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

PIIR/ReSID-dataset
dataset· 395 dl
395 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare