VIP5: Towards Multimodal Foundation Models for Recommendation
Shijie Geng, Juntao Tan, Shuchang Liu, Zuohui Fu, Yongfeng, Zhang

TL;DR
This paper introduces VIP5, a multimodal foundation model that unifies visual, textual, and personalization data for recommendation tasks, leveraging shared architecture and efficient training to enhance performance and efficiency.
Contribution
The paper proposes VIP5, a novel multimodal foundation model for recommendation that integrates multiple modalities using personalized prompts and a parameter-efficient training approach.
Findings
Improved recommendation accuracy with multimodal data
Efficient training via lightweight adapters
Unified architecture for diverse modalities
Abstract
Computer Vision (CV), Natural Language Processing (NLP), and Recommender Systems (RecSys) are three prominent AI applications that have traditionally developed independently, resulting in disparate modeling and engineering methodologies. This has impeded the ability for these fields to directly benefit from each other's advancements. With the recent development of foundation models, large language models have emerged as a potential general-purpose interface for unifying different modalities and problem formulations. In light of this, we propose the development of a multimodal foundation model (MFM) considering visual, textual, and personalization modalities under the P5 recommendation paradigm, thus named VIP5 (Visual P5), to unify various modalities and recommendation tasks. This will enable the processing of multiple modalities in a shared architecture for improved recommendations. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
