MMGRec: Multimodal Generative Recommendation with Transformer Model
Han Liu, Yinwei Wei, Xuemeng Song, Weili Guan, Yuan-Fang Li, Liqiang Nie

TL;DR
This paper introduces MMGRec, a novel multimodal recommendation model that employs a generative approach with a Transformer and hierarchical quantization to improve recommendation accuracy and efficiency.
Contribution
The paper proposes a generative paradigm for multimodal recommendation using a Transformer and a new hierarchical quantization method, addressing limitations of previous embed-and-retrieve models.
Findings
Outperforms state-of-the-art methods in recommendation accuracy
Reduces inference cost compared to traditional embed-and-retrieve models
Effectively models non-sequential interaction data with relation-aware self-attention
Abstract
Multimodal recommendation aims to recommend user-preferred candidates based on her/his historically interacted items and associated multimodal information. Previous studies commonly employ an embed-and-retrieve paradigm: learning user and item representations in the same embedding space, then retrieving similar candidate items for a user via embedding inner product. However, this paradigm suffers from inference cost, interaction modeling, and false-negative issues. Toward this end, we propose a new MMGRec model to introduce a generative paradigm into multimodal recommendation. Specifically, we first devise a hierarchical quantization method Graph RQ-VAE to assign Rec-ID for each item from its multimodal and CF information. Consisting of a tuple of semantically meaningful tokens, Rec-ID serves as the unique identifier of each item. Afterward, we train a Transformer-based recommender to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Recommender Systems and Techniques · Speech and dialogue systems
MethodsAttention Is All You Need · Dropout · Dense Connections · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Absolute Position Encodings
