Personalized Large Vision-Language Models

Chau Pham; Hoang Phan; David Doermann; Yunjie Tian

arXiv:2412.17610·cs.CV·December 24, 2024

Personalized Large Vision-Language Models

Chau Pham, Hoang Phan, David Doermann, Yunjie Tian

PDF

Open Access

TL;DR

This paper introduces PLVM, a personalized vision-language model that enables interactive, referential dialogues with continuous concept addition, using a lightweight Aligner for effective image referencing.

Contribution

The paper presents a novel personalized LVLM with an Aligner that aligns referential concepts to images, allowing continuous concept addition without extra costs.

Findings

01

PLVM outperforms existing models in personalized image-language tasks.

02

The Aligner effectively recognizes referential concepts with negligible computational overhead.

03

Qualitative and quantitative analyses confirm PLVM's superiority.

Abstract

The personalization model has gained significant attention in image generation yet remains underexplored for large vision-language models (LVLMs). Beyond generic ones, with personalization, LVLMs handle interactive dialogues using referential concepts (e.g., ``Mike and Susan are talking.'') instead of the generic form (e.g., ``a boy and a girl are talking.''), making the conversation more customizable and referentially friendly. In addition, PLVM is equipped to continuously add new concepts during a dialogue without incurring additional costs, which significantly enhances the practicality. PLVM proposes Aligner, a pre-trained visual encoder to align referential concepts with the queried images. During the dialogues, it extracts features of reference images with these corresponding concepts and recognizes them in the queried image, enabling personalization. We note that the computational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeographic Information Systems Studies · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques

MethodsSoftmax · Attention Is All You Need · ALIGN