What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A   Controlled Study for Transferable Insights

Xin Wen; Bingchen Zhao; Yilun Chen; Jiangmiao Pang; Xiaojuan Qi

arXiv:2405.21070·cs.CV·October 29, 2024

What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights

Xin Wen, Bingchen Zhao, Yilun Chen, Jiangmiao Pang, Xiaojuan Qi

PDF

Open Access 1 Repo

TL;DR

This study investigates why CLIP pre-trained on web-scale datasets is more robust to data imbalance than supervised learning, revealing that its dynamic classification task and descriptive language supervision contribute to its generalizability.

Contribution

The paper provides controlled experiments uncovering mechanisms behind CLIP's robustness and offers transferable insights applicable to various learning paradigms.

Findings

01

CLIP's pretext task isolates bias from dominant classes.

02

Robustness improves with more descriptive language and larger data.

03

Models trained on imbalanced data can reach CLIP-level performance.

Abstract

Severe data imbalance naturally exists among web-scale vision-language datasets. Despite this, we find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning, and demonstrates significant effectiveness in learning generalizable representations. With an aim to investigate the reasons behind this finding, we conduct controlled experiments to study various underlying factors, and reveal that CLIP's pretext task forms a dynamic classification problem wherein only a subset of classes is present in training. This isolates the bias from dominant classes and implicitly balances the learning signal. Furthermore, the robustness and discriminability of CLIP improve with more descriptive language supervision, larger data scale, and broader open-world concepts, which are inaccessible to supervised learning. Our study not only uncovers the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cvmi-lab/clip-beyond-tail
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Imbalanced Data Classification Techniques · Data Mining Algorithms and Applications

MethodsContrastive Language-Image Pre-training