Revisiting Weakly Supervised Pre-Training of Visual Perception Models
Mannat Singh, Laura Gustafson, Aaron Adcock, Vinicius de Freitas Reis,, Bugra Gedik, Raj Prateek Kosaraju, Dhruv Mahajan, Ross Girshick, Piotr, Doll\'ar, Laurens van der Maaten

TL;DR
This paper demonstrates that weakly supervised pre-training using hashtags can outperform self-supervised methods in visual recognition tasks, offering a promising alternative to traditional fully supervised approaches.
Contribution
It introduces SWAG, a weakly supervised pre-training method using hashtags, and shows its competitive performance against self-supervised models across various transfer-learning settings.
Findings
Weakly supervised models outperform self-supervised counterparts.
SWAG achieves strong transfer-learning performance.
Models do not learn harmful stereotypes.
Abstract
Model pre-training is a cornerstone of modern visual recognition systems. Although fully supervised pre-training on datasets like ImageNet is still the de-facto standard, recent studies suggest that large-scale weakly supervised pre-training can outperform fully supervised approaches. This paper revisits weakly-supervised pre-training of models using hashtag supervision with modern versions of residual networks and the largest-ever dataset of images and corresponding hashtags. We study the performance of the resulting models in various transfer-learning settings including zero-shot transfer. We also compare our models with those obtained via large-scale self-supervised learning. We find our weakly-supervised models to be very competitive across all settings, and find they substantially outperform their self-supervised counterparts. We also include an investigation into whether our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗timm/regnety_160.swag_ft_in1kmodel· 318 dl318 dl
- 🤗timm/regnety_160.swag_lc_in1kmodel· 8.3k dl8.3k dl
- 🤗timm/regnety_320.swag_ft_in1kmodel· 393 dl393 dl
- 🤗timm/regnety_320.swag_lc_in1kmodel· 302 dl302 dl
- 🤗timm/regnety_1280.swag_ft_in1kmodel· 56 dl56 dl
- 🤗timm/regnety_1280.swag_lc_in1kmodel· 54 dl54 dl
- 🤗mlx-vision/vit_base_patch16_224.swag_lin-mlximmodel· 4 dl4 dl
- 🤗mlx-vision/vit_base_patch16_384.swag_e2e-mlximmodel· 11 dl11 dl
- 🤗mlx-vision/vit_large_patch16_512.swag_e2e-mlximmodel· 7 dl· ♡ 17 dl♡ 1
- 🤗mlx-vision/vit_huge_patch14_224.swag_lin-mlximmodel· 6 dl6 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
