NoteLLM-2: Multimodal Large Representation Models for Recommendation

Chao Zhang; Haoxin Zhang; Shiwei Wu; Di Wu; Tong Xu; Xiangyu Zhao; Yan; Gao; Yao Hu; Enhong Chen

arXiv:2405.16789·cs.IR·January 22, 2025·1 cites

NoteLLM-2: Multimodal Large Representation Models for Recommendation

Chao Zhang, Haoxin Zhang, Shiwei Wu, Di Wu, Tong Xu, Xiangyu Zhao, Yan, Gao, Yao Hu, Enhong Chen

PDF

Open Access 1 Repo 1 Datasets

TL;DR

NoteLLM-2 introduces a novel multimodal representation framework that enhances visual information integration into large language models for improved recommendation performance, addressing previous limitations in multimodal tasks.

Contribution

The paper presents a new end-to-end fine-tuning approach and two innovative methods for integrating visual data into LLMs, improving multimodal recommendation capabilities.

Findings

01

Effective multimodal representations achieved through prompt-based and late fusion techniques.

02

Enhanced recommendation performance demonstrated in extensive online and offline experiments.

03

Framework successfully balances visual and textual information for better item-to-item recommendations.

Abstract

Large Language Models (LLMs) have demonstrated exceptional proficiency in text understanding and embedding tasks. However, their potential in multimodal representation, particularly for item-to-item (I2I) recommendations, remains underexplored. While leveraging existing Multimodal Large Language Models (MLLMs) for such tasks is promising, challenges arise due to their delayed release compared to corresponding LLMs and the inefficiency in representation tasks. To address these issues, we propose an end-to-end fine-tuning method that customizes the integration of any existing LLMs and vision encoders for efficient multimodal representation. Preliminary experiments revealed that fine-tuned LLMs often neglect image content. To counteract this, we propose NoteLLM-2, a novel framework that enhances visual information. Specifically, we propose two approaches: first, a prompt-based method that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

applied-machine-learning-lab/notellm
pytorchOfficial

Datasets

Sherirto/BD4UI
dataset· 35 dl
35 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Recommender Systems and Techniques

MethodsFocus