Black Sheep in the Herd: Playing with Spuriously Correlated Attributes   for Vision-Language Recognition

Xinyu Tian; Shu Zou; Zhaoyuan Yang; Mengqi He; Jing Zhang

arXiv:2502.15809·cs.LG·February 25, 2025

Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language Recognition

Xinyu Tian, Shu Zou, Zhaoyuan Yang, Mengqi He, Jing Zhang

PDF

Open Access

TL;DR

This paper identifies the over-reliance of vision-language models on spuriously correlated attributes, proposing methods to filter and shield against these biases, thereby improving out-of-distribution generalization.

Contribution

It introduces Spurious Attribute Probing and Spurious Attribute Shielding to detect and mitigate biased attributes, enhancing model robustness without sacrificing downstream performance.

Findings

01

SAP and SAS improve accuracy on distribution shifts

02

Achieve state-of-the-art results across 11 datasets

03

Enhance generalization without harming downstream tasks

Abstract

Few-shot adaptation for Vision-Language Models (VLMs) presents a dilemma: balancing in-distribution accuracy with out-of-distribution generalization. Recent research has utilized low-level concepts such as visual attributes to enhance generalization. However, this study reveals that VLMs overly rely on a small subset of attributes on decision-making, which co-occur with the category but are not inherently part of it, termed spuriously correlated attributes. This biased nature of VLMs results in poor generalization. To address this, 1) we first propose Spurious Attribute Probing (SAP), identifying and filtering out these problematic attributes to significantly enhance the generalization of existing attribute-based methods; 2) We introduce Spurious Attribute Shielding (SAS), a plug-and-play module that mitigates the influence of these attributes on prediction, seamlessly integrating into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications