Training BatchNorm and Only BatchNorm: On the Expressive Power of Random   Features in CNNs

Jonathan Frankle; David J. Schwab; and Ari S. Morcos

arXiv:2003.00152·cs.LG·March 23, 2021·79 cites

Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs

Jonathan Frankle, David J. Schwab, and Ari S. Morcos

PDF

Open Access 4 Repos 1 Video

TL;DR

Training only the affine parameters of BatchNorm in CNNs reveals significant expressive power, enabling high performance even when all other weights are fixed at random, highlighting the importance of feature normalization.

Contribution

This paper demonstrates that training only BatchNorm affine parameters, with all other weights fixed randomly, achieves surprisingly high accuracy, revealing the expressive capacity of feature normalization.

Findings

01

ResNets reach 82% accuracy on CIFAR-10 when training only BatchNorm parameters.

02

BatchNorm enables networks to learn to disable a third of random features.

03

Training affine parameters alone outperforms training an equivalent number of random parameters elsewhere.

Abstract

A wide variety of deep learning techniques from style transfer to multitask learning rely on training affine transformations of features. Most prominent among these is the popular feature normalization technique BatchNorm, which normalizes activations and then subsequently applies a learned affine transform. In this paper, we aim to understand the role and expressive power of affine parameters used to transform features in this way. To isolate the contribution of these parameters from that of the learned features they transform, we investigate the performance achieved when training only these parameters in BatchNorm and freezing all weights at their random initializations. Doing so leads to surprisingly high performance considering the significant limitations that this style of training imposes. For example, sufficiently deep ResNets reach 82% (CIFAR-10) and 32% (ImageNet, top-5)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications

MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection