A ConvNet for the 2020s
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor, Darrell, Saining Xie

TL;DR
This paper revisits pure ConvNets, modernizes ResNet architectures, and introduces ConvNeXt, which achieves competitive accuracy and scalability comparable to Transformers on various vision tasks.
Contribution
It systematically modernizes ConvNets to close the performance gap with Transformers, resulting in the ConvNeXt family of models.
Findings
ConvNeXt achieves 87.8% ImageNet top-1 accuracy.
ConvNeXt outperforms Swin Transformers on COCO detection.
ConvNeXt maintains simplicity and efficiency of standard ConvNets.
Abstract
The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model. A vanilla ViT, on the other hand, faces difficulties when applied to general computer vision tasks such as object detection and semantic segmentation. It is the hierarchical Transformers (e.g., Swin Transformers) that reintroduced several ConvNet priors, making Transformers practically viable as a generic vision backbone and demonstrating remarkable performance on a wide variety of vision tasks. However, the effectiveness of such hybrid approaches is still largely credited to the intrinsic superiority of Transformers, rather than the inherent inductive biases of convolutions. In this work, we reexamine the design spaces and test the limits of what a pure ConvNet can achieve. We gradually "modernize" a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗facebook/convnext-base-384model· 304 dl· ♡ 1304 dl♡ 1
- 🤗timm/convnext_tiny.fb_in22kmodel· 399k dl· ♡ 4399k dl♡ 4
- 🤗facebook/convnext-base-224-22k-1kmodel· 308 dl· ♡ 5308 dl♡ 5
- 🤗facebook/convnext-base-224-22kmodel· 2.0k dl· ♡ 92.0k dl♡ 9
- 🤗facebook/convnext-base-224model· 4.8k dl· ♡ 94.8k dl♡ 9
- 🤗facebook/convnext-base-384-22k-1kmodel· 698 dl· ♡ 5698 dl♡ 5
- 🤗facebook/convnext-large-224-22k-1kmodel· 426 dl· ♡ 3426 dl♡ 3
- 🤗facebook/convnext-large-224-22kmodel· 285 dl285 dl
- 🤗facebook/convnext-large-224model· 931 dl· ♡ 28931 dl♡ 28
- 🤗facebook/convnext-large-384-22k-1kmodel· 137 dl137 dl
Videos
[ML News] ConvNeXt: Convolutions return | China regulates algorithms | Saliency cropping examined· youtube
ConvNeXt: A ConvNet for the 2020s – Paper Explained (with animations)· youtube
ConvNeXt: A ConvNet for the 2020s | Paper Explained· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning
MethodsLarge convolutional kernels · Depthwise Convolution · AdamW · ConvNeXt · LayerScale · 1x1 Convolution
