Drift-Aware Continual Tokenization for Generative Recommendation

Yuebo Feng; Jiahao Liu; Mingzhe Han; Dongsheng Li; Hansu Gu; Peng Zhang; Tun Lu; Ning Gu

arXiv:2603.29705·cs.IR·April 1, 2026

Drift-Aware Continual Tokenization for Generative Recommendation

Yuebo Feng, Jiahao Liu, Mingzhe Han, Dongsheng Li, Hansu Gu, Peng Zhang, Tun Lu, Ning Gu

PDF

1 Repo

TL;DR

This paper introduces DACT, a drift-aware continual tokenization framework that adaptively updates item token sequences in generative recommendation systems to handle evolving data without extensive retraining.

Contribution

DACT combines drift detection and hierarchical code reassignment to improve tokenization stability and adaptability in dynamic recommendation environments.

Findings

01

DACT outperforms baseline methods on three real-world datasets.

02

It effectively balances adaptation to new data with preservation of existing knowledge.

03

Experiments demonstrate reduced disruption to prior token-embedding alignment.

Abstract

Generative recommendation commonly adopts a two-stage pipeline in which a learnable tokenizer maps items to discrete token sequences (i.e. identifiers) and an autoregressive generative recommender model (GRM) performs prediction based on these identifiers. Recent tokenizers further incorporate collaborative signals so that items with similar user-behavior patterns receive similar codes, substantially improving recommendation quality. However, real-world environments evolve continuously: new items cause identifier collision and shifts, while new interactions induce collaborative drift in existing items (e.g., changing co-occurrence patterns and popularity). Fully retraining both tokenizer and GRM is often prohibitively expensive, yet naively fine-tuning the tokenizer can alter token sequences for the majority of existing items, undermining the GRM's learned token-embedding alignment. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HomesAmaranta/DACT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.