Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
Micah Goldblum, Hossein Souri, Renkun Ni, Manli Shu, Viraj Prabhu,, Gowthami Somepalli, Prithvijit Chattopadhyay, Mark Ibrahim, Adrien Bardes,, Judy Hoffman, Rama Chellappa, Andrew Gordon Wilson, Tom Goldstein

TL;DR
This paper benchmarks a wide range of pretrained models, including CNNs, ViTs, and SSL models, across diverse computer vision tasks to guide practitioners in selecting optimal backbones.
Contribution
It provides a comprehensive large-scale comparison of pretrained backbones across multiple vision tasks, highlighting strengths and weaknesses of different approaches.
Findings
CNNs pretrained on large datasets perform best on most tasks.
SSL backbones are highly competitive when controlling for architecture and dataset size.
Vision transformers and SSL models show promise but still lag behind supervised CNNs in many scenarios.
Abstract
Neural network based computer vision systems are typically built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an ImageNet-trained convolutional neural network. However, the recent past has seen the emergence of countless backbones pretrained using various algorithms and datasets. While this abundance of choice has led to performance increases for a range of systems, it is difficult for practitioners to make informed decisions about which backbone to choose. Battle of the Backbones (BoB) makes this choice easier by benchmarking a diverse suite of pretrained models, including vision-language models, those trained via self-supervised learning, and the Stable Diffusion backbone, across a diverse set of computer vision tasks ranging from classification to object detection to OOD generalization and more. Furthermore, BoB…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsSparse Evolutionary Training · Diffusion
