VIP5: Towards Multimodal Foundation Models for Recommendation

Shijie Geng; Juntao Tan; Shuchang Liu; Zuohui Fu; Yongfeng; Zhang

arXiv:2305.14302·cs.IR·October 17, 2023·1 cites

VIP5: Towards Multimodal Foundation Models for Recommendation

Shijie Geng, Juntao Tan, Shuchang Liu, Zuohui Fu, Yongfeng, Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces VIP5, a multimodal foundation model that unifies visual, textual, and personalization data for recommendation tasks, leveraging shared architecture and efficient training to enhance performance and efficiency.

Contribution

The paper proposes VIP5, a novel multimodal foundation model for recommendation that integrates multiple modalities using personalized prompts and a parameter-efficient training approach.

Findings

01

Improved recommendation accuracy with multimodal data

02

Efficient training via lightweight adapters

03

Unified architecture for diverse modalities

Abstract

Computer Vision (CV), Natural Language Processing (NLP), and Recommender Systems (RecSys) are three prominent AI applications that have traditionally developed independently, resulting in disparate modeling and engineering methodologies. This has impeded the ability for these fields to directly benefit from each other's advancements. With the recent development of foundation models, large language models have emerged as a potential general-purpose interface for unifying different modalities and problem formulations. In light of this, we propose the development of a multimodal foundation model (MFM) considering visual, textual, and personalization modalities under the P5 recommendation paradigm, thus named VIP5 (Visual P5), to unify various modalities and recommendation tasks. This will enable the processing of multiple modalities in a shared architecture for improved recommendations. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jeykigung/vip5
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications