Towards Unified Multi-Modal Personalization: Large Vision-Language   Models for Generative Recommendation and Beyond

Tianxin Wei; Bowen Jin; Ruirui Li; Hansi Zeng; Zhengyang Wang; Jianhui; Sun; Qingyu Yin; Hanqing Lu; Suhang Wang; Jingrui He; Xianfeng Tang

arXiv:2403.10667·cs.IR·March 29, 2024·2 cites

Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond

Tianxin Wei, Bowen Jin, Ruirui Li, Hansi Zeng, Zhengyang Wang, Jianhui, Sun, Qingyu Yin, Hanqing Lu, Suhang Wang, Jingrui He, Xianfeng Tang

PDF

Open Access 1 Repo

TL;DR

This paper introduces UniMP, a unified multi-modal personalization framework leveraging large vision-language models to handle diverse personalized tasks across modalities, demonstrating superior performance on a new comprehensive benchmark.

Contribution

The paper proposes a generic, extensible generative framework for multi-modal personalization that unifies various tasks and modalities, utilizing foundational models for improved user-specific personalization.

Findings

01

Outperforms specialized methods on a new multi-modal benchmark

02

Effectively handles diverse personalized tasks including recommendation and image generation

03

Demonstrates the flexibility of large vision-language models in personalization

Abstract

Developing a universal model that can effectively harness heterogeneous resources and respond to a wide range of personalized needs has been a longstanding community aspiration. Our daily choices, especially in domains like fashion and retail, are substantially shaped by multi-modal data, such as pictures and textual descriptions. These modalities not only offer intuitive guidance but also cater to personalized user preferences. However, the predominant personalization approaches mainly focus on the ID or text-based recommendation problem, failing to comprehend the information spanning various tasks or modalities. In this paper, our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP), which effectively leverages multi-modal data while eliminating the complexities associated with task- and modality-specific customization. We argue that the advancements…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weitianxin/UniMP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Image Retrieval and Classification Techniques · Topic Modeling

MethodsFocus