Demonstrating and Reducing Shortcuts in Vision-Language Representation   Learning

Maurits Bleeker; Mariya Hendriksen; Andrew Yates; Maarten de Rijke

arXiv:2402.17510·cs.CV·August 2, 2024·2 cites

Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning

Maurits Bleeker, Mariya Hendriksen, Andrew Yates, Maarten de Rijke

PDF

Open Access 1 Repo

TL;DR

This paper investigates how contrastive vision-language models often learn shortcuts instead of comprehensive representations, introduces synthetic shortcuts to evaluate this issue, and proposes methods to mitigate shortcut learning, highlighting ongoing challenges.

Contribution

The paper introduces a synthetic shortcut framework for evaluating and reducing shortcut learning in contrastive vision-language models, revealing limitations of current training methods.

Findings

01

Contrastive models often learn shortcuts rather than full representations.

02

Synthetic shortcuts can be injected to evaluate shortcut learning.

03

Proposed methods partially reduce shortcut reliance.

Abstract

Vision-language models (VLMs) mainly rely on contrastive training to learn general-purpose representations of images and captions. We focus on the situation when one image is associated with several captions, each caption containing both information shared among all captions and unique information per caption about the scene depicted in the image. In such cases, it is unclear whether contrastive losses are sufficient for learning task-optimal representations that contain all the information provided by the captions or whether the contrastive learning setup encourages the learning of a simple shortcut that minimizes contrastive loss. We introduce synthetic shortcuts for vision-language: a training and evaluation framework where we inject synthetic shortcuts into image-text data. We show that contrastive VLMs trained from scratch or fine-tuned with data containing these synthetic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mauritsbleeker/svl-framework
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling

MethodsFocus · Contrastive Learning