Task Bias in Vision-Language Models

Sachit Menon; Ishaan Preetam Chandratreya; Carl Vondrick

arXiv:2212.04412·cs.CV·December 9, 2022·1 cites

Task Bias in Vision-Language Models

Sachit Menon, Ishaan Preetam Chandratreya, Carl Vondrick

PDF

Open Access

TL;DR

This paper investigates task bias in CLIP's visual representations, revealing unpredictable biases across images and proposing visual prompts as a method to steer representations towards specific tasks.

Contribution

It introduces a novel visual prompt technique to mitigate task bias in vision-language models, enabling task-specific control over representations.

Findings

01

Visual representations are often biased towards certain tasks.

02

Task bias in CLIP is unpredictable and inconsistent across images.

03

Visual prompts can effectively steer representations towards desired tasks.

Abstract

Incidental supervision from language has become a popular approach for learning generic visual representations that can be prompted to perform many recognition tasks in computer vision. We conduct an in-depth exploration of the CLIP model and show that its visual representation is often strongly biased towards solving some tasks more than others. Moreover, which task the representation will be biased towards is unpredictable, with little consistency across images. To resolve this task bias, we show how to learn a visual prompt that guides the representation towards features relevant to their task of interest. Our results show that these visual prompts can be independent of the input image and still effectively provide a conditioning mechanism to steer visual representations towards the desired task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling

MethodsContrastive Language-Image Pre-training