Characterizing signal propagation to close the performance gap in   unnormalized ResNets

Andrew Brock; Soham De; Samuel L. Smith

arXiv:2101.08692·cs.LG·January 28, 2021·21 cites

Characterizing signal propagation to close the performance gap in unnormalized ResNets

Andrew Brock, Soham De, Samuel L. Smith

PDF

Open Access 4 Repos 10 Models 3 Videos

TL;DR

This paper introduces analysis tools to understand signal propagation in unnormalized ResNets, enabling the design of high-performing networks without Batch Normalization, achieving competitive ImageNet results.

Contribution

It presents a novel analysis framework for unnormalized ResNets and demonstrates how to maintain signal integrity without normalization layers using adapted Weight Standardization.

Findings

01

Unnormalized ResNets can match state-of-the-art performance on ImageNet.

02

Signal propagation can be preserved without Batch Normalization.

03

The proposed method is effective across various FLOP budgets.

Abstract

Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs. Building on recent theoretical analyses of deep ResNets at initialization, we propose a simple set of analysis tools to characterize signal propagation on the forward pass, and leverage these tools to design highly performant ResNets without activation normalization layers. Crucial to our success is an adapted version of the recently proposed Weight Standardization. Our analysis tools show how this technique preserves the signal in networks with ReLU or Swish activation functions by ensuring that the per-channel activation means do not grow with depth. Across a range of FLOP budgets, our networks attain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

W&B Paper Reading Group: ConViT· youtube

W&B Paper Reading Group: Nf-ResNet· youtube

Characterizing signal propagation to close the performance gap in unnormalized ResNets· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning

MethodsSigmoid Activation · (FiLe@Against@Claim)How do I file a claim against Expedia? · Activation Normalization · Weight Standardization · *Communicated@Fast*How Do I Communicate to Expedia?