Battle of the Backbones: A Large-Scale Comparison of Pretrained Models   across Computer Vision Tasks

Micah Goldblum; Hossein Souri; Renkun Ni; Manli Shu; Viraj Prabhu,; Gowthami Somepalli; Prithvijit Chattopadhyay; Mark Ibrahim; Adrien Bardes,; Judy Hoffman; Rama Chellappa; Andrew Gordon Wilson; Tom Goldstein

arXiv:2310.19909·cs.CV·November 21, 2023·26 cites

Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks

Micah Goldblum, Hossein Souri, Renkun Ni, Manli Shu, Viraj Prabhu,, Gowthami Somepalli, Prithvijit Chattopadhyay, Mark Ibrahim, Adrien Bardes,, Judy Hoffman, Rama Chellappa, Andrew Gordon Wilson, Tom Goldstein

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper benchmarks a wide range of pretrained models, including CNNs, ViTs, and SSL models, across diverse computer vision tasks to guide practitioners in selecting optimal backbones.

Contribution

It provides a comprehensive large-scale comparison of pretrained backbones across multiple vision tasks, highlighting strengths and weaknesses of different approaches.

Findings

01

CNNs pretrained on large datasets perform best on most tasks.

02

SSL backbones are highly competitive when controlling for architecture and dataset size.

03

Vision transformers and SSL models show promise but still lag behind supervised CNNs in many scenarios.

Abstract

Neural network based computer vision systems are typically built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an ImageNet-trained convolutional neural network. However, the recent past has seen the emergence of countless backbones pretrained using various algorithms and datasets. While this abundance of choice has led to performance increases for a range of systems, it is difficult for practitioners to make informed decisions about which backbone to choose. Battle of the Backbones (BoB) makes this choice easier by benchmarking a diverse suite of pretrained models, including vision-language models, those trained via self-supervised learning, and the Stable Diffusion backbone, across a diverse set of computer vision tasks ranging from classification to object detection to OOD generalization and more. Furthermore, BoB…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsSparse Evolutionary Training · Diffusion