Ego: Embedding-Guided Personalization of Vision-Language Models
Soroush Seifi, Simon Gardier, Vaggelis Dorovatas, Daniel Olmeda Reino, Rahaf Aljundi

TL;DR
This paper introduces an efficient method for personalizing vision-language models by extracting and utilizing internal attention-based visual tokens to recall specific concepts, enhancing personalized AI assistant capabilities.
Contribution
The proposed approach leverages the model's internal attention to personalize vision-language models without additional training, improving scalability and deployment efficiency.
Findings
Strong performance gains over SOTA methods
Effective across single, multi-concept, and video personalization
Minimal personalization overhead
Abstract
AI assistants that support humans in daily life are becoming increasingly feasible, driven by the rapid advancements in multimodal language models. A key challenge lies in overcoming the generic nature of these models to deliver personalized experiences. Existing approaches to personalizing large vision language models often rely on additional training stages, which limit generality and scalability, or on engineered pipelines with external pre-trained modules, which hinder deployment efficiency. In this work, we propose an efficient personalization method that leverages the model's inherent ability to capture personalized concepts. Specifically, we extract visual tokens that predominantly represent the target concept by utilizing the model's internal attention mechanisms. These tokens serve as a memory of that specific concept, enabling the model to recall and describe it when it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling
