Distributionally Robust Classification on a Data Budget

Benjamin Feuer; Ameya Joshi; Minh Pham; Chinmay Hegde

arXiv:2308.03821·cs.CV·August 9, 2023

Distributionally Robust Classification on a Data Budget

Benjamin Feuer, Ameya Joshi, Minh Pham, Chinmay Hegde

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that a standard ResNet-50 trained on a limited dataset can achieve distributional robustness comparable to large-scale models like CLIP trained on hundreds of millions of samples, challenging assumptions about data requirements.

Contribution

Introduces JANuS, a new dataset collection, and shows that robust image classification is possible with significantly less data than previously thought.

Findings

01

ResNet-50 trained on 2.4 million samples achieves robustness similar to CLIP trained on 400 million samples.

02

First demonstration of near state-of-the-art robustness with limited data budgets.

03

Provides datasets and code for reproducibility.

Abstract

Real world uses of deep learning require predictable model behavior under distribution shifts. Models such as CLIP show emergent natural distributional robustness comparable to humans, but may require hundreds of millions of training samples. Can we train robust learners in a domain where data is limited? To rigorously address this question, we introduce JANuS (Joint Annotations and Names Set), a collection of four new training datasets with images, labels, and corresponding captions, and perform a series of carefully controlled investigations of factors contributing to robustness in image classification, then compare those results to findings derived from a large-scale meta-analysis. Using this approach, we show that standard ResNet-50 trained with the cross-entropy loss on 2.4 million image samples can attain comparable robustness to a CLIP ResNet-50 trained on 400 million samples. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

penfever/vlhub
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)

MethodsContrastive Language-Image Pre-training