Contextualized Visual Personalization in Vision-Language Models

Yeongtak Oh; Sangwon Yu; Junsung Park; Han Cheol Moon; Jisoo Mok; Sungroh Yoon

arXiv:2602.03454·cs.CV·May 20, 2026

Contextualized Visual Personalization in Vision-Language Models

Yeongtak Oh, Sangwon Yu, Junsung Park, Han Cheol Moon, Jisoo Mok, Sungroh Yoon

PDF

1 Repo

TL;DR

This paper introduces CoViP, a framework that enhances vision-language models' ability to generate personalized image captions by leveraging visual-textual context, addressing limitations of existing models.

Contribution

The paper formalizes the challenge of contextualized visual personalization and proposes CoViP, a novel framework with reinforcement learning and caption augmentation to improve personalized captioning.

Findings

01

CoViP significantly improves personalized image captioning performance.

02

Existing models show limitations in leveraging visual context for personalization.

03

CoViP yields holistic gains across various downstream personalization tasks.

Abstract

Despite recent progress in vision-language models (VLMs), existing approaches often fail to generate personalized responses based on the user's specific experiences, as they lack the ability to associate visual inputs with a user's accumulated visual-textual context. We newly formalize this challenge as contextualized visual personalization, which requires the visual recognition and textual retrieval of personalized visual experiences by VLMs when interpreting new images. To address this issue, we propose CoViP, a unified framework that treats personalized image captioning as a core task for contextualized visual personalization and improves this capability through reinforcement-learning-based post-training and caption-augmented generation. We further introduce diagnostic evaluations that explicitly rule out textual shortcut solutions and verify whether VLMs truly leverage visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oyt9306/CoViP
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.