DiffMM: Multi-Modal Diffusion Model for Recommendation
Yangqin Jiang, Lianghao Xia, Wei Wei, Da Luo, Kangyi Lin, and Chao Huang

TL;DR
DiffMM introduces a multi-modal graph diffusion model with contrastive learning to enhance user representations in recommendation systems, effectively addressing data sparsity and improving multi-modal alignment.
Contribution
The paper proposes a novel DiffMM framework that combines graph diffusion and contrastive learning to better integrate multi-modal data in recommendation systems.
Findings
Outperforms baseline models on three public datasets
Effectively aligns multi-modal features with user-item interactions
Enhances user representation learning in sparse data scenarios
Abstract
The rise of online multi-modal sharing platforms like TikTok and YouTube has enabled personalized recommender systems to incorporate multiple modalities (such as visual, textual, and acoustic) into user representations. However, addressing the challenge of data sparsity in these systems remains a key issue. To address this limitation, recent research has introduced self-supervised learning techniques to enhance recommender systems. However, these methods often rely on simplistic random augmentation or intuitive cross-view information, which can introduce irrelevant noise and fail to accurately align the multi-modal context with user-item interaction modeling. To fill this research gap, we propose a novel multi-modal graph diffusion model for recommendation called DiffMM. Our framework integrates a modality-aware graph diffusion model with a cross-modal contrastive learning paradigm to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques
MethodsALIGN · Contrastive Learning · Diffusion · Attentive Walk-Aggregating Graph Neural Network
