Understanding Visual Concepts Across Models

Brandon Trabucco; Max Gurinas; Kyle Doherty; Ruslan Salakhutdinov

arXiv:2406.07506·cs.CV·June 12, 2024

Understanding Visual Concepts Across Models

Brandon Trabucco, Max Gurinas, Kyle Doherty, Ruslan Salakhutdinov

PDF

Open Access 1 Repo

TL;DR

This paper analyzes how large multimodal models learn and represent new visual concepts through word embeddings, revealing their model-specific nature and non-transferability across different models and tasks.

Contribution

It provides a large-scale analysis of visual concept embeddings, demonstrating their non-transferability and the existence of perturbative solutions that can generate or classify arbitrary concepts.

Findings

01

Embeddings are model-specific and non-transferable.

02

Perturbations within an epsilon-ball can generate or classify arbitrary concepts.

03

Popular soft prompt-tuning methods find these perturbative solutions.

Abstract

Large multimodal models such as Stable Diffusion can generate, detect, and classify new visual concepts after fine-tuning just a single word embedding. Do models learn similar words for the same concepts (i.e. <orange-cat> = orange + cat)? We conduct a large-scale analysis on three state-of-the-art models in text-to-image generation, open-set object detection, and zero-shot classification, and find that new word embeddings are model-specific and non-transferable. Across 4,800 new embeddings trained for 40 diverse visual concepts on four standard datasets, we find perturbations within an $ϵ$ -ball to any prior embedding that generate, detect, and classify an arbitrary concept. When these new embeddings are spliced into new models, fine-tuning that targets the original model is lost. We show popular soft prompt-tuning approaches find these perturbative solutions when applied to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

visual-words/visual-words
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Geographic Information Systems Studies

MethodsDiffusion