High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock, Soham De, Samuel L. Smith, Karen Simonyan

TL;DR
This paper introduces Normalizer-Free ResNets with adaptive gradient clipping, achieving high accuracy and faster training on ImageNet without normalization layers, outperforming batch-normalized models especially in fine-tuning scenarios.
Contribution
The authors develop an adaptive gradient clipping technique and a new class of Normalizer-Free ResNets that match or surpass batch-normalized models in accuracy and training speed.
Findings
Achieved 86.5% top-1 accuracy on ImageNet with the largest models.
Normalizer-Free models outperform batch-normalized counterparts in fine-tuning.
Models are up to 8.7x faster to train than traditional methods.
Abstract
Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, and are often unstable for large learning rates or strong data augmentations. In this work, we develop an adaptive gradient clipping technique which overcomes these instabilities, and design a significantly improved class of Normalizer-Free ResNets. Our smaller models match the test accuracy of an EfficientNet-B7 on ImageNet while being up to 8.7x faster to train, and our largest models attain a new state-of-the-art top-1 accuracy of 86.5%. In addition, Normalizer-Free models attain significantly better performance than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗timm/eca_nfnet_l0model· 23 dl· ♡ 123 dl♡ 1
- 🤗kadirnar/timm_model_listmodel· ♡ 1♡ 1
- 🤗timm/dm_nfnet_f0.dm_in1kmodel· 13k dl· ♡ 113k dl♡ 1
- 🤗timm/dm_nfnet_f1.dm_in1kmodel· 5.0k dl5.0k dl
- 🤗timm/dm_nfnet_f2.dm_in1kmodel· 12k dl12k dl
- 🤗timm/dm_nfnet_f3.dm_in1kmodel· 15k dl15k dl
- 🤗timm/dm_nfnet_f4.dm_in1kmodel· 268 dl268 dl
- 🤗timm/dm_nfnet_f5.dm_in1kmodel· 86 dl86 dl
- 🤗timm/dm_nfnet_f6.dm_in1kmodel· 103 dl103 dl
- 🤗timm/eca_nfnet_l0.ra2_in1kmodel· 14k dl14k dl
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Gradient Clipping
