LLM-Enhanced Multimodal Fusion for Cross-Domain Sequential Recommendation

Wangyu Wu; Zhenhong Chen; Wenqiao Zhang; Xianglin Qiu; Siqi Song; Xiaowei Huang; Fei Ma; Jimin Xiao

arXiv:2506.17966·cs.IR·March 2, 2026

LLM-Enhanced Multimodal Fusion for Cross-Domain Sequential Recommendation

Wangyu Wu, Zhenhong Chen, Wenqiao Zhang, Xianglin Qiu, Siqi Song, Xiaowei Huang, Fei Ma, Jimin Xiao

PDF

TL;DR

This paper introduces LLM-EMF, a novel multimodal fusion approach that leverages large language models and visual-text data to improve cross-domain sequential recommendation accuracy.

Contribution

It proposes a new method integrating LLM-enhanced multimodal data with a multi-attention mechanism for better cross-domain user preference modeling.

Findings

01

Outperforms existing methods on four e-commerce datasets

02

Effectively captures complex user preferences across domains

03

Demonstrates the benefit of multimodal data in recommendation systems

Abstract

Cross-Domain Sequential Recommendation (CDSR) predicts user behavior by leveraging historical interactions across multiple domains, focusing on modeling cross-domain preferences and capturing both intra- and inter-sequence item relationships. We propose LLM-Enhanced Multimodal Fusion for Cross-Domain Sequential Recommendation (LLM-EMF), a novel and advanced approach that enhances textual information with Large Language Models (LLM) knowledge and significantly improves recommendation performance through the fusion of visual and textual data. Using the frozen CLIP model, we generate image and text embeddings, thereby enriching item representations with multimodal data. A multiple attention mechanism jointly learns both single-domain and cross-domain preferences, effectively capturing and understanding complex user interests across diverse domains. Evaluations conducted on four e-commerce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.