VisualLens: Personalization through Task-Agnostic Visual History

Wang Bill Zhu; Deqing Fu; Kai Sun; Yi Lu; Zhaojiang Lin; Seungwhan Moon; Kanika Narang; Mustafa Canim; Yue Liu; Anuj Kumar; Xin Luna Dong

arXiv:2411.16034·cs.CV·October 21, 2025

VisualLens: Personalization through Task-Agnostic Visual History

Wang Bill Zhu, Deqing Fu, Kai Sun, Yi Lu, Zhaojiang Lin, Seungwhan Moon, Kanika Narang, Mustafa Canim, Yue Liu, Anuj Kumar, Xin Luna Dong

PDF

Open Access 1 Datasets 1 Video

TL;DR

VisualLens introduces a novel personalization framework leveraging users' daily life images and multimodal large language models, outperforming existing multimodal recommendation methods in accuracy and robustness.

Contribution

The paper presents VisualLens, a new approach that uses task-agnostic visual histories and multimodal models for personalized recommendations, along with two new benchmark datasets.

Findings

01

VisualLens improves recommendation accuracy by 5-10% on Hit@3.

02

It outperforms GPT-4o by 2-5%.

03

The method is robust across different history lengths and content categories.

Abstract

Existing recommendation systems either rely on user interaction logs, such as online shopping history for shopping recommendations, or focus on text signals. However, item-based histories are not always accessible, and are not generalizable for multimodal recommendation. We hypothesize that a user's visual history -- comprising images from daily life -- can offer rich, task-agnostic insights into their interests and preferences, and thus be leveraged for effective personalization. To this end, we propose VisualLens, a novel framework that leverages multimodal large language models (MLLMs) to enable personalization using task-agnostic visual history. VisualLens extracts, filters, and refines a spectrum user profile from the visual history to support personalized recommendation. We created two new benchmarks, Google-Review-V and Yelp-V, with task-agnostic visual histories, and show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

kaimeta/wearables_benchmarks
dataset· 26 dl
26 dl

Videos

VisualLens: Personalization through Task-Agnostic Visual History· slideslive

Taxonomy

TopicsDigital Games and Media · Digital Humanities and Scholarship

MethodsFocus