Personalized Large Vision-Language Models
Chau Pham, Hoang Phan, David Doermann, Yunjie Tian

TL;DR
This paper introduces PLVM, a personalized vision-language model that enables interactive, referential dialogues with continuous concept addition, using a lightweight Aligner for effective image referencing.
Contribution
The paper presents a novel personalized LVLM with an Aligner that aligns referential concepts to images, allowing continuous concept addition without extra costs.
Findings
PLVM outperforms existing models in personalized image-language tasks.
The Aligner effectively recognizes referential concepts with negligible computational overhead.
Qualitative and quantitative analyses confirm PLVM's superiority.
Abstract
The personalization model has gained significant attention in image generation yet remains underexplored for large vision-language models (LVLMs). Beyond generic ones, with personalization, LVLMs handle interactive dialogues using referential concepts (e.g., ``Mike and Susan are talking.'') instead of the generic form (e.g., ``a boy and a girl are talking.''), making the conversation more customizable and referentially friendly. In addition, PLVM is equipped to continuously add new concepts during a dialogue without incurring additional costs, which significantly enhances the practicality. PLVM proposes Aligner, a pre-trained visual encoder to align referential concepts with the queried images. During the dialogues, it extracts features of reference images with these corresponding concepts and recognizes them in the queried image, enabling personalization. We note that the computational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsSoftmax · Attention Is All You Need · ALIGN
