Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
Priya Goyal, Quentin Duval, Isaac Seessel, Mathilde Caron, Ishan, Misra, Levent Sagun, Armand Joulin, Piotr Bojanowski

TL;DR
This paper demonstrates that large-scale self-supervised training on uncurated images enhances model robustness, fairness, and bias reduction, capturing diverse semantic and stylistic information without supervision.
Contribution
It introduces a method of training massive models on uncurated images without supervision, leading to more robust and fair models that learn diverse, salient visual information.
Findings
Models trained on uncurated images outperform supervised models in fairness and robustness.
The approach captures artistic styles, geolocations, and multilingual embeddings from visual content.
The resulting models are less biased and more equitable across diverse benchmarks.
Abstract
Discriminative self-supervised learning allows training models on any random group of internet images, and possibly recover salient information that helps differentiate between the images. Applied to ImageNet, this leads to object centric features that perform on par with supervised features on most object-centric downstream tasks. In this work, we question if using this ability, we can learn any salient and more representative information present in diverse unbounded set of images from across the globe. To do so, we train models on billions of random images without any data pre-processing or prior assumptions about what we want the model to learn. We scale our model size to dense 10 billion parameters to avoid underfitting on a large data size. We extensively study and validate our model performance on over 50 benchmarks including fairness, robustness to distribution shift,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗facebook/regnet-y-320-seermodel· 8 dl· ♡ 28 dl♡ 2
- 🤗facebook/regnet-y-320-seer-in1kmodel· 16 dl16 dl
- 🤗facebook/regnet-y-640-seermodel· 4 dl4 dl
- 🤗facebook/regnet-y-1280-seermodel· 4 dl4 dl
- 🤗facebook/regnet-y-640-seer-in1kmodel· 15 dl15 dl
- 🤗facebook/regnet-y-1280-seer-in1kmodel· 15 dl· ♡ 115 dl♡ 1
- 🤗timm/regnety_320.seermodel· 52 dl52 dl
- 🤗timm/regnety_320.seer_ft_in1kmodel· 60 dl60 dl
- 🤗timm/regnety_640.seermodel· 89 dl89 dl
- 🤗timm/regnety_640.seer_ft_in1kmodel· 67 dl67 dl
Videos
[ML News] DeepMind controls fusion | Yann LeCun's JEPA architecture | US: AI can't copyright its art· youtube
SEER explained: Vision Models more Robust & Fair when pretrained on UNCURATED images!?· youtube
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection · Face recognition and analysis
