CapsID: Soft-Routed Variable-Length Semantic IDs for Generative Recommendation

Wenzhuo Cheng; Menghang Gong; Qixin Guo; Hang Zheng; Zhaobin Yang; Jianguo Lou; Zhengwei Zheng

arXiv:2605.05096·cs.IR·May 7, 2026

CapsID: Soft-Routed Variable-Length Semantic IDs for Generative Recommendation

Wenzhuo Cheng, Menghang Gong, Qixin Guo, Hang Zheng, Zhaobin Yang, Jianguo Lou, Zhengwei Zheng

PDF

TL;DR

CapsID introduces a capsule routing-based tokenizer for generative recommendation, improving semantic representation and efficiency over traditional hard quantization methods.

Contribution

It replaces hard residual quantization with capsule routing, enhancing semantic fidelity and reducing inference latency in generative recommendation systems.

Findings

01

CapsID improves Recall@10 by 9.6% on average over ReSID.

02

CapsID matches or exceeds COBRA-style systems on benchmarks.

03

CapsID runs at 51% of the inference latency of comparable systems.

Abstract

Generative recommendation maps each item to a sequence of Semantic IDs (SIDs) and recasts retrieval as autoregressive token generation. In this paradigm the main bottleneck is the tokenizer rather than the Transformer: residual vector quantization with a hard nearest-neighbor assignment at every layer collapses multi-faceted item semantics at cluster boundaries and propagates early errors to later SID positions. A common workaround is to append a dense vector or attribute prefix to the SID, but this dual-representation design inflates inference cost and gives up the simplicity of a generative interface. We address the bottleneck at the tokenizer itself. CAPSID replaces hard residual quantization with capsule routing: at each layer an item probabilistically routes to several semantic capsules, the residual is updated by the routed reconstruction rather than by a single winning code, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.