The in-context inductive biases of vision-language models differ across   modalities

Kelsey Allen; Ishita Dasgupta; Eliza Kosoy; Andrew K.; Lampinen

arXiv:2502.01530·cs.CV·March 17, 2025

The in-context inductive biases of vision-language models differ across modalities

Kelsey Allen, Ishita Dasgupta, Eliza Kosoy, Andrew K., Lampinen

PDF

Open Access

TL;DR

This paper investigates how vision-language models exhibit different inductive biases depending on whether stimuli are presented visually or in text, revealing modality-specific generalization patterns and their implications.

Contribution

It introduces experimental paradigms to compare how vision-language models generalize across modalities, highlighting differences in shape bias and the influence of textual descriptions.

Findings

01

Models show a shape bias, especially with visual stimuli.

02

Text presentation affects generalization based on adjective order.

03

Bias effects vary across models and experimental paradigms.

Abstract

Inductive biases are what allow learners to make guesses in the absence of conclusive evidence. These biases have often been studied in cognitive science using concepts or categories -- e.g. by testing how humans generalize a new category from a few examples that leave the category boundary ambiguous. We use these approaches to study generalization in foundation models during in-context learning. Modern foundation models can condition on both vision and text, and differences in how they interpret and learn from these different modalities is an emerging area of study. Here, we study how their generalizations vary by the modality in which stimuli are presented, and the way the stimuli are described in text. We study these biases with three different experimental paradigms, across three different vision-language models. We find that the models generally show some bias towards generalizing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCategorization, perception, and language · Language, Metaphor, and Cognition