NoteLLM-2: Multimodal Large Representation Models for Recommendation
Chao Zhang, Haoxin Zhang, Shiwei Wu, Di Wu, Tong Xu, Xiangyu Zhao, Yan, Gao, Yao Hu, Enhong Chen

TL;DR
NoteLLM-2 introduces a novel multimodal representation framework that enhances visual information integration into large language models for improved recommendation performance, addressing previous limitations in multimodal tasks.
Contribution
The paper presents a new end-to-end fine-tuning approach and two innovative methods for integrating visual data into LLMs, improving multimodal recommendation capabilities.
Findings
Effective multimodal representations achieved through prompt-based and late fusion techniques.
Enhanced recommendation performance demonstrated in extensive online and offline experiments.
Framework successfully balances visual and textual information for better item-to-item recommendations.
Abstract
Large Language Models (LLMs) have demonstrated exceptional proficiency in text understanding and embedding tasks. However, their potential in multimodal representation, particularly for item-to-item (I2I) recommendations, remains underexplored. While leveraging existing Multimodal Large Language Models (MLLMs) for such tasks is promising, challenges arise due to their delayed release compared to corresponding LLMs and the inefficiency in representation tasks. To address these issues, we propose an end-to-end fine-tuning method that customizes the integration of any existing LLMs and vision encoders for efficient multimodal representation. Preliminary experiments revealed that fine-tuned LLMs often neglect image content. To counteract this, we propose NoteLLM-2, a novel framework that enhances visual information. Specifically, we propose two approaches: first, a prompt-based method that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Recommender Systems and Techniques
MethodsFocus
