DMESR: Dual-view MLLM-based Enhancing Framework for Multimodal Sequential Recommendation
Mingyao Huang, Qidong Liu, Wenxuan Yang, Moranxin Wang, Yuqi Sun, Haiping Zhu, Feng Tian, Yan Chen

TL;DR
This paper introduces DMESR, a novel framework that enhances multimodal sequential recommendation by aligning and fusing semantic representations from MLLMs and original text, improving recommendation accuracy.
Contribution
The paper proposes a dual-view framework with contrastive learning and cross-attention fusion to better utilize multimodal and textual semantics in sequential recommendation.
Findings
Outperforms existing methods on three real-world datasets
Effectively aligns cross-modal semantic representations
Enhances recommendation accuracy across multiple architectures
Abstract
Sequential Recommender Systems (SRS) aim to predict users' next interaction based on their historical behaviors, while still facing the challenge of data sparsity. With the rapid advancement of Multimodal Large Language Models (MLLMs), leveraging their multimodal understanding capabilities to enrich item semantic representation has emerged as an effective enhancement strategy for SRS. However, existing MLLM-enhanced recommendation methods still suffer from two key limitations. First, they struggle to effectively align multimodal representations, leading to suboptimal utilization of semantic information across modalities. Second, they often overly rely on MLLM-generated content while overlooking the fine-grained semantic cues contained in the original textual data of items. To address these issues, we propose a Dual-view MLLM-based Enhancing framework for multimodal Sequential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Machine Learning in Healthcare · Explainable Artificial Intelligence (XAI)
