Multi-Aspect Cross-modal Quantization for Generative Recommendation

Fuwei Zhang; Xiaoyu Liu; Dongbo Xi; Jishen Yin; Huan Chen; Peng Yan; Fuzhen Zhuang; Zhao Zhang

arXiv:2511.15122·cs.IR·November 25, 2025

Multi-Aspect Cross-modal Quantization for Generative Recommendation

Fuwei Zhang, Xiaoyu Liu, Dongbo Xi, Jishen Yin, Huan Chen, Peng Yan, Fuzhen Zhuang, Zhao Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces MACRec, a novel method that leverages multi-aspect cross-modal quantization to improve semantic ID learning and generative recommendation by effectively integrating multimodal information and alignments.

Contribution

The paper proposes a multi-aspect cross-modal quantization framework that enhances semantic ID quality and generative recommendation performance by incorporating multimodal data and alignments.

Findings

01

Improved codebook usability through cross-modal quantization.

02

Enhanced generative recommendation accuracy with multi-aspect alignments.

03

Demonstrated effectiveness on three benchmark datasets.

Abstract

Generative Recommendation (GR) has emerged as a new paradigm in recommender systems. This approach relies on quantized representations to discretize item features, modeling users' historical interactions as sequences of discrete tokens. Based on these tokenized sequences, GR predicts the next item by employing next-token prediction methods. The challenges of GR lie in constructing high-quality semantic identifiers (IDs) that are hierarchically organized, minimally conflicting, and conducive to effective generative model training. However, current approaches remain limited in their ability to harness multimodal information and to capture the deep and intricate interactions among diverse modalities, both of which are essential for learning high-quality semantic IDs and for effectively training GR models. To address this, we propose Multi-Aspect Cross-modal quantization for generative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multi-Aspect Cross-modal Quantization for Generative Recommendation· underline

Taxonomy

TopicsRecommender Systems and Techniques · Topic Modeling · Explainable Artificial Intelligence (XAI)