Personalization Toolkit: Training Free Personalization of Large Vision Language Models

Soroush Seifi; Vaggelis Dorovatas; Matteo Cassinelli; Fabien Despinoy; Daniel Olmeda Reino; Rahaf Aljundi

arXiv:2502.02452·cs.CV·April 29, 2026

Personalization Toolkit: Training Free Personalization of Large Vision Language Models

Soroush Seifi, Vaggelis Dorovatas, Matteo Cassinelli, Fabien Despinoy, Daniel Olmeda Reino, Rahaf Aljundi

PDF

TL;DR

This paper introduces \\ours, a training-free toolkit for personalized large vision-language models that uses pre-trained vision models, retrieval, and visual prompts to enable multi-concept personalization without additional training.

Contribution

The paper presents a novel training-free approach for LVLM personalization that leverages pre-trained models, retrieval, and visual prompts, outperforming training-based methods.

Findings

01

Achieves state-of-the-art results in personalization benchmarks.

02

Enables multi-concept personalization across images and videos.

03

Operates without any additional training, improving efficiency.

Abstract

Personalization of Large Vision-Language Models (LVLMs) involves customizing models to recognize specific users or object instances and to generate contextually tailored responses. Existing approaches rely on time-consuming training for each item, making them impractical for real-world deployment, as reflected in current personalization benchmarks limited to object-centric single-concept evaluations. In this paper, we present a novel training-free approach to LVLM personalization called \ours. We introduce a comprehensive, real-world benchmark designed to rigorously evaluate various aspects of the personalization task. \ours leverages pre-trained vision foundation models to extract distinctive features, applies retrieval-augmented generation (RAG) techniques to identify instances within visual inputs, and employs visual prompting strategies to guide model outputs. Our model-agnostic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.