Characterizing signal propagation to close the performance gap in unnormalized ResNets
Andrew Brock, Soham De, Samuel L. Smith

TL;DR
This paper introduces analysis tools to understand signal propagation in unnormalized ResNets, enabling the design of high-performing networks without Batch Normalization, achieving competitive ImageNet results.
Contribution
It presents a novel analysis framework for unnormalized ResNets and demonstrates how to maintain signal integrity without normalization layers using adapted Weight Standardization.
Findings
Unnormalized ResNets can match state-of-the-art performance on ImageNet.
Signal propagation can be preserved without Batch Normalization.
The proposed method is effective across various FLOP budgets.
Abstract
Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs. Building on recent theoretical analyses of deep ResNets at initialization, we propose a simple set of analysis tools to characterize signal propagation on the forward pass, and leverage these tools to design highly performant ResNets without activation normalization layers. Crucial to our success is an adapted version of the recently proposed Weight Standardization. Our analysis tools show how this technique preserves the signal in networks with ReLU or Swish activation functions by ensuring that the per-channel activation means do not grow with depth. Across a range of FLOP budgets, our networks attain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗kadirnar/timm_model_listmodel· ♡ 1♡ 1
- 🤗timm/dm_nfnet_f0.dm_in1kmodel· 13k dl· ♡ 113k dl♡ 1
- 🤗timm/dm_nfnet_f1.dm_in1kmodel· 5.0k dl5.0k dl
- 🤗timm/dm_nfnet_f2.dm_in1kmodel· 12k dl12k dl
- 🤗timm/dm_nfnet_f3.dm_in1kmodel· 15k dl15k dl
- 🤗timm/dm_nfnet_f4.dm_in1kmodel· 268 dl268 dl
- 🤗timm/dm_nfnet_f5.dm_in1kmodel· 86 dl86 dl
- 🤗timm/dm_nfnet_f6.dm_in1kmodel· 103 dl103 dl
- 🤗timm/eca_nfnet_l0.ra2_in1kmodel· 14k dl14k dl
- 🤗timm/eca_nfnet_l1.ra2_in1kmodel· 1.4k dl1.4k dl
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning
MethodsSigmoid Activation · (FiLe@Against@Claim)How do I file a claim against Expedia? · Activation Normalization · Weight Standardization · *Communicated@Fast*How Do I Communicate to Expedia?
