Rec-GPT4V: Multimodal Recommendation with Large Vision-Language Models

Yuqing Liu; Yu Wang; Lichao Sun; Philip S. Yu

arXiv:2402.08670·cs.AI·February 14, 2024·3 cites

Rec-GPT4V: Multimodal Recommendation with Large Vision-Language Models

Yuqing Liu, Yu Wang, Lichao Sun, Philip S. Yu

PDF

Open Access

TL;DR

Rec-GPT4V introduces a novel reasoning scheme leveraging large vision-language models to improve multimodal recommendation by incorporating user preferences and image summaries, addressing limitations of existing LVLMs.

Contribution

The paper proposes Rec-GPT4V with Visual-Summary Thought, a new approach that enhances multimodal recommendation by integrating user preferences and image comprehension using LVLMs.

Findings

01

VST improves recommendation accuracy across datasets

02

LVLMs effectively generate item image summaries

03

Rec-GPT4V outperforms baseline models in experiments

Abstract

The development of large vision-language models (LVLMs) offers the potential to address challenges faced by traditional multimodal recommendations thanks to their proficient understanding of static images and textual dynamics. However, the application of LVLMs in this field is still limited due to the following complexities: First, LVLMs lack user preference knowledge as they are trained from vast general datasets. Second, LVLMs suffer setbacks in addressing multiple image dynamics in scenarios involving discrete, noisy, and redundant image sequences. To overcome these issues, we propose the novel reasoning scheme named Rec-GPT4V: Visual-Summary Thought (VST) of leveraging large vision-language models for multimodal recommendation. We utilize user history as in-context user preferences to address the first challenge. Next, we prompt LVLMs to generate item image summaries and utilize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques