LLM-Enhanced Multimodal Fusion for Cross-Domain Sequential Recommendation
Wangyu Wu, Zhenhong Chen, Wenqiao Zhang, Xianglin Qiu, Siqi Song, Xiaowei Huang, Fei Ma, Jimin Xiao

TL;DR
This paper introduces LLM-EMF, a novel multimodal fusion approach that leverages large language models and visual-text data to improve cross-domain sequential recommendation accuracy.
Contribution
It proposes a new method integrating LLM-enhanced multimodal data with a multi-attention mechanism for better cross-domain user preference modeling.
Findings
Outperforms existing methods on four e-commerce datasets
Effectively captures complex user preferences across domains
Demonstrates the benefit of multimodal data in recommendation systems
Abstract
Cross-Domain Sequential Recommendation (CDSR) predicts user behavior by leveraging historical interactions across multiple domains, focusing on modeling cross-domain preferences and capturing both intra- and inter-sequence item relationships. We propose LLM-Enhanced Multimodal Fusion for Cross-Domain Sequential Recommendation (LLM-EMF), a novel and advanced approach that enhances textual information with Large Language Models (LLM) knowledge and significantly improves recommendation performance through the fusion of visual and textual data. Using the frozen CLIP model, we generate image and text embeddings, thereby enriching item representations with multimodal data. A multiple attention mechanism jointly learns both single-domain and cross-domain preferences, effectively capturing and understanding complex user interests across diverse domains. Evaluations conducted on four e-commerce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
