Spurious Feature Eraser: Stabilizing Test-Time Adaptation for   Vision-Language Foundation Model

Huan Ma; Yan Zhu; Changqing Zhang; Peilin Zhao; Baoyuan Wu; Long-Kai; Huang; Qinghua Hu; Bingzhe Wu

arXiv:2403.00376·cs.CV·January 15, 2025·1 cites

Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model

Huan Ma, Yan Zhu, Changqing Zhang, Peilin Zhao, Baoyuan Wu, Long-Kai, Huang, Qinghua Hu, Bingzhe Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces Spurious Feature Eraser, a test-time prompt tuning method that enhances vision-language models' robustness by removing spurious features, thereby improving their generalization on downstream tasks.

Contribution

The paper proposes a novel test-time prompt tuning approach to erase spurious features, improving the generalization of vision-language models like CLIP on downstream tasks.

Findings

01

Significant performance improvements over existing methods.

02

Effective suppression of decision shortcuts during inference.

03

Enhanced reliance on invariant causal features.

Abstract

Vision-language foundation models have exhibited remarkable success across a multitude of downstream tasks due to their scalability on extensive image-text paired data. However, these models also display significant limitations when applied to downstream tasks, such as fine-grained image classification, as a result of ``decision shortcuts'' that hinder their generalization capabilities. In this work, we find that the CLIP model possesses a rich set of features, encompassing both \textit{desired invariant causal features} and \textit{undesired decision shortcuts}. Moreover, the underperformance of CLIP on downstream tasks originates from its inability to effectively utilize pre-trained features in accordance with specific task requirements. To address this challenge, we propose a simple yet effective method, Spurious Feature Eraser (SEraser), to alleviate the decision shortcuts by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mahuanaaa/intta
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition

MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training