Zero-Shot Recommendations with Pre-Trained Large Language Models for Multimodal Nudging
Rachel M. Harrison, Anton Dereventsov, Anton Bibin

TL;DR
This paper introduces a zero-shot recommendation method for multimodal content using pre-trained large language models to generate unified semantic embeddings, enabling similarity-based recommendations without additional training.
Contribution
It proposes a novel approach that leverages pre-trained LLMs to unify multimodal content representations for zero-shot recommendation tasks.
Findings
Effective in a synthetic multimodal nudging environment
Handles tabular, textual, and visual data seamlessly
No additional learning required for recommendations
Abstract
We present a method for zero-shot recommendation of multimodal non-stationary content that leverages recent advancements in the field of generative AI. We propose rendering inputs of different modalities as textual descriptions and to utilize pre-trained LLMs to obtain their numerical representations by computing semantic embeddings. Once unified representations of all content items are obtained, the recommendation can be performed by computing an appropriate similarity metric between them without any additional learning. We demonstrate our approach on a synthetic multimodal nudging environment, where the inputs consist of tabular, textual, and visual data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Multimodal Machine Learning Applications
