TL;DR
CARD introduces a unified visual semantic unit and a non-uniform quantization method to improve generative recommendation by better modeling semantics and balancing embeddings.
Contribution
The paper proposes a novel framework combining a structured visual semantic unit with a learnable non-uniform quantization to enhance SID quality and address distribution imbalance.
Findings
CARD outperforms baseline methods on multiple datasets.
The non-uniform quantization module improves codebook utilization.
The proposed modules are robust and plug-and-play across schemes.
Abstract
Generative recommendation frameworks typically represent items as discrete Semantic IDs (SIDs). While existing studies have sought to enhance SID construction by incorporating multimodal content, collaborative signals, or more advanced quantization techniques, learning high-quality SIDs still faces two key challenges: (1) The two-stage generative recommendation paradigm (SID construction and autoregressive generation) provides insufficient supervision for heterogeneous fusion, which hinders learning high-quality SIDs, and (2) non-uniform embeddings lead to codeword imbalance and generation bias. To address these challenges, we propose a novel generative recommendation framework, called CARD. CARD introduces a visual semantic unit that unifies textual, visual, and collaborative signals into a structured visual representation prior to encoding, enabling holistic semantic modeling and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
