Training BatchNorm Only in Neural Architecture Search and Beyond
Yichen Zhu, Jie Du, Yuqin Zhu, Yi Wang, Zhicai Ou, Feifei Feng and, Jian Tang

TL;DR
This paper explores why training only BatchNorm in neural architecture search works well, providing theoretical insights, addressing fairness issues, and proposing a new performance indicator to improve NAS results across benchmarks.
Contribution
It offers a theoretical understanding of train-BN-only supernets, identifies fairness issues in search spaces, and introduces a novel composite performance indicator for NAS.
Findings
Train-BN-only networks converge to the neural tangent kernel regime.
Unfair competition favors convolution operators with BatchNorm.
The proposed performance indicator improves NAS evaluation across benchmarks.
Abstract
This work investigates the usage of batch normalization in neural architecture search (NAS). Specifically, Frankle et al. find that training BatchNorm only can achieve nontrivial performance. Furthermore, Chen et al. claim that training BatchNorm only can speed up the training of the one-shot NAS supernet over ten times. Critically, there is no effort to understand 1) why training BatchNorm only can find the perform-well architectures with the reduced supernet-training time, and 2) what is the difference between the train-BN-only supernet and the standard-train supernet. We begin by showing that the train-BN-only networks converge to the neural tangent kernel regime, obtain the same training dynamics as train all parameters theoretically. Our proof supports the claim to train BatchNorm only on supernet with less training time. Then, we empirically disclose that train-BN-only supernet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Neural Networks and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Batch Normalization · Convolution
