Multi-modal Generative Models in Recommendation System

Arnau Ramisa; Rene Vidal; Yashar Deldjoo; Zhankui He; Julian McAuley,; Anton Korikov; Scott Sanner; Mahesh Sathiamoorthy; Atoosa Kasrizadeh; Silvia; Milano; and Francesco Ricci

arXiv:2409.10993·cs.IR·September 18, 2024

Multi-modal Generative Models in Recommendation System

Arnau Ramisa, Rene Vidal, Yashar Deldjoo, Zhankui He, Julian McAuley,, Anton Korikov, Scott Sanner, Mahesh Sathiamoorthy, Atoosa Kasrizadeh, Silvia, Milano, and Francesco Ricci

PDF

TL;DR

This paper discusses the development of multi-modal generative models for recommendation systems, enabling richer user interactions and improved understanding by integrating multiple data modalities like text and images.

Contribution

It reviews approaches that leverage multiple data modalities simultaneously to enhance recommendation systems with richer interactions and better product understanding.

Findings

01

Multi-modal models improve recommendation relevance.

02

Visual and textual data integration enhances user experience.

03

Existing systems often treat modalities independently.

Abstract

Many recommendation systems limit user inputs to text strings or behavior signals such as clicks and purchases, and system outputs to a list of products sorted by relevance. With the advent of generative AI, users have come to expect richer levels of interactions. In visual search, for example, a user may provide a picture of their desired product along with a natural language modification of the content of the picture (e.g., a dress like the one shown in the picture but in red color). Moreover, users may want to better understand the recommendations they receive by visualizing how the product fits their use case, e.g., with a representation of how a garment might look on them, or how a furniture item might look in their room. Such advanced levels of interaction require recommendation systems that are able to discover both shared and complementary information about the product across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.