Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond
Tianxin Wei, Bowen Jin, Ruirui Li, Hansi Zeng, Zhengyang Wang, Jianhui, Sun, Qingyu Yin, Hanqing Lu, Suhang Wang, Jingrui He, Xianfeng Tang

TL;DR
This paper introduces UniMP, a unified multi-modal personalization framework leveraging large vision-language models to handle diverse personalized tasks across modalities, demonstrating superior performance on a new comprehensive benchmark.
Contribution
The paper proposes a generic, extensible generative framework for multi-modal personalization that unifies various tasks and modalities, utilizing foundational models for improved user-specific personalization.
Findings
Outperforms specialized methods on a new multi-modal benchmark
Effectively handles diverse personalized tasks including recommendation and image generation
Demonstrates the flexibility of large vision-language models in personalization
Abstract
Developing a universal model that can effectively harness heterogeneous resources and respond to a wide range of personalized needs has been a longstanding community aspiration. Our daily choices, especially in domains like fashion and retail, are substantially shaped by multi-modal data, such as pictures and textual descriptions. These modalities not only offer intuitive guidance but also cater to personalized user preferences. However, the predominant personalization approaches mainly focus on the ID or text-based recommendation problem, failing to comprehend the information spanning various tasks or modalities. In this paper, our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP), which effectively leverages multi-modal data while eliminating the complexities associated with task- and modality-specific customization. We argue that the advancements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Image Retrieval and Classification Techniques · Topic Modeling
MethodsFocus
