DMESR: Dual-view MLLM-based Enhancing Framework for Multimodal Sequential Recommendation

Mingyao Huang; Qidong Liu; Wenxuan Yang; Moranxin Wang; Yuqi Sun; Haiping Zhu; Feng Tian; Yan Chen

arXiv:2602.13715·cs.IR·February 17, 2026

DMESR: Dual-view MLLM-based Enhancing Framework for Multimodal Sequential Recommendation

Mingyao Huang, Qidong Liu, Wenxuan Yang, Moranxin Wang, Yuqi Sun, Haiping Zhu, Feng Tian, Yan Chen

PDF

Open Access

TL;DR

This paper introduces DMESR, a novel framework that enhances multimodal sequential recommendation by aligning and fusing semantic representations from MLLMs and original text, improving recommendation accuracy.

Contribution

The paper proposes a dual-view framework with contrastive learning and cross-attention fusion to better utilize multimodal and textual semantics in sequential recommendation.

Findings

01

Outperforms existing methods on three real-world datasets

02

Effectively aligns cross-modal semantic representations

03

Enhances recommendation accuracy across multiple architectures

Abstract

Sequential Recommender Systems (SRS) aim to predict users' next interaction based on their historical behaviors, while still facing the challenge of data sparsity. With the rapid advancement of Multimodal Large Language Models (MLLMs), leveraging their multimodal understanding capabilities to enrich item semantic representation has emerged as an effective enhancement strategy for SRS. However, existing MLLM-enhanced recommendation methods still suffer from two key limitations. First, they struggle to effectively align multimodal representations, leading to suboptimal utilization of semantic information across modalities. Second, they often overly rely on MLLM-generated content while overlooking the fine-grained semantic cues contained in the original textual data of items. To address these issues, we propose a Dual-view MLLM-based Enhancing framework for multimodal Sequential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Machine Learning in Healthcare · Explainable Artificial Intelligence (XAI)