Calibrating Multi-modal Representations: A Pursuit of Group Robustness   without Annotations

Chenyu You; Yifei Min; Weicheng Dai; Jasjeet S. Sekhon; Lawrence; Staib; James S. Duncan

arXiv:2403.07241·cs.CV·November 5, 2024·1 cites

Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations

Chenyu You, Yifei Min, Weicheng Dai, Jasjeet S. Sekhon, Lawrence, Staib, James S. Duncan

PDF

Open Access 1 Repo

TL;DR

This paper proposes a lightweight, group-robustness calibration method for CLIP that mitigates reliance on spurious features without needing group annotations, improving generalization across diverse tasks.

Contribution

It introduces a novel representation calibration approach using contrastive learning on a calibration set, enhancing group robustness without group labels.

Findings

01

Significant reduction in reliance on spurious features.

02

Improved generalization across multiple benchmarks.

03

Effective calibration without group annotations.

Abstract

Fine-tuning pre-trained vision-language models, like CLIP, has yielded success on diverse downstream tasks. However, several pain points persist for this paradigm: (i) directly tuning entire pre-trained models becomes both time-intensive and computationally costly. Additionally, these tuned models tend to become highly specialized, limiting their practicality for real-world deployment; (ii) recent studies indicate that pre-trained vision-language classifiers may overly depend on spurious features -- patterns that correlate with the target in training data, but are not related to the true labeling function; and (iii) existing studies on mitigating the reliance on spurious features, largely based on the assumption that we can identify such features, does not provide definitive assurance for real-world applications. As a piloting study, this work focuses on exploring mitigating the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

charlesyou999648/cfr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training