A ConvNet for the 2020s

Zhuang Liu; Hanzi Mao; Chao-Yuan Wu; Christoph Feichtenhofer; Trevor; Darrell; Saining Xie

arXiv:2201.03545·cs.CV·March 3, 2022·171 cites

A ConvNet for the 2020s

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor, Darrell, Saining Xie

PDF

Open Access 5 Repos 10 Models 3 Videos

TL;DR

This paper revisits pure ConvNets, modernizes ResNet architectures, and introduces ConvNeXt, which achieves competitive accuracy and scalability comparable to Transformers on various vision tasks.

Contribution

It systematically modernizes ConvNets to close the performance gap with Transformers, resulting in the ConvNeXt family of models.

Findings

01

ConvNeXt achieves 87.8% ImageNet top-1 accuracy.

02

ConvNeXt outperforms Swin Transformers on COCO detection.

03

ConvNeXt maintains simplicity and efficiency of standard ConvNets.

Abstract

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model. A vanilla ViT, on the other hand, faces difficulties when applied to general computer vision tasks such as object detection and semantic segmentation. It is the hierarchical Transformers (e.g., Swin Transformers) that reintroduced several ConvNet priors, making Transformers practically viable as a generic vision backbone and demonstrating remarkable performance on a wide variety of vision tasks. However, the effectiveness of such hybrid approaches is still largely credited to the intrinsic superiority of Transformers, rather than the inherent inductive biases of convolutions. In this work, we reexamine the design spaces and test the limits of what a pure ConvNet can achieve. We gradually "modernize" a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

[ML News] ConvNeXt: Convolutions return | China regulates algorithms | Saliency cropping examined· youtube

ConvNeXt: A ConvNet for the 2020s – Paper Explained (with animations)· youtube

ConvNeXt: A ConvNet for the 2020s | Paper Explained· youtube

Taxonomy

TopicsAdvanced Neural Network Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning

MethodsLarge convolutional kernels · Depthwise Convolution · AdamW · ConvNeXt · LayerScale · 1x1 Convolution