MMSRARec: Summarization and Retrieval Augumented Sequential Recommendation Based on Multimodal Large Language Model
Haoyu Wang, Yitong Wang, Jining Wang

TL;DR
This paper introduces MMSRARec, a multimodal recommendation system that uses large language models to summarize items, incorporate collaborative signals, and improve recommendation accuracy with interpretability and efficiency.
Contribution
The paper proposes a novel multimodal sequential recommendation framework that combines summarization, retrieval, and collaborative signals with fine-tuning of large language models.
Findings
Effective recommendation accuracy demonstrated on benchmark datasets.
Enhanced interpretability through item summarization.
Balanced computational cost with retrieval-augmented approach.
Abstract
Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated significant potential in recommendation systems. However, the effective application of MLLMs to multimodal sequential recommendation remains unexplored: A) Existing methods primarily leverage the multimodal semantic understanding capabilities of pre-trained MLLMs to generate item embeddings or semantic IDs, thereby enhancing traditional recommendation models. These approaches generate item representations that exhibit limited interpretability, and pose challenges when transferring to language model-based recommendation systems. B) Other approaches convert user behavior sequence into image-text pairs and perform recommendation through multiple MLLM inference, incurring prohibitive computational and time costs. C) Current MLLM-based recommendation systems generally neglect the integration of collaborative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
