VisualLens: Personalization through Task-Agnostic Visual History
Wang Bill Zhu, Deqing Fu, Kai Sun, Yi Lu, Zhaojiang Lin, Seungwhan Moon, Kanika Narang, Mustafa Canim, Yue Liu, Anuj Kumar, Xin Luna Dong

TL;DR
VisualLens introduces a novel personalization framework leveraging users' daily life images and multimodal large language models, outperforming existing multimodal recommendation methods in accuracy and robustness.
Contribution
The paper presents VisualLens, a new approach that uses task-agnostic visual histories and multimodal models for personalized recommendations, along with two new benchmark datasets.
Findings
VisualLens improves recommendation accuracy by 5-10% on Hit@3.
It outperforms GPT-4o by 2-5%.
The method is robust across different history lengths and content categories.
Abstract
Existing recommendation systems either rely on user interaction logs, such as online shopping history for shopping recommendations, or focus on text signals. However, item-based histories are not always accessible, and are not generalizable for multimodal recommendation. We hypothesize that a user's visual history -- comprising images from daily life -- can offer rich, task-agnostic insights into their interests and preferences, and thus be leveraged for effective personalization. To this end, we propose VisualLens, a novel framework that leverages multimodal large language models (MLLMs) to enable personalization using task-agnostic visual history. VisualLens extracts, filters, and refines a spectrum user profile from the visual history to support personalized recommendation. We created two new benchmarks, Google-Review-V and Yelp-V, with task-agnostic visual histories, and show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDigital Games and Media · Digital Humanities and Scholarship
MethodsFocus
