Weight Rescaling: Effective and Robust Regularization for Deep Neural   Networks with Batch Normalization

Ziquan Liu; Yufei Cui; Jia Wan; Yu Mao; Antoni B. Chan

arXiv:2102.03497·cs.LG·June 22, 2022·1 cites

Weight Rescaling: Effective and Robust Regularization for Deep Neural Networks with Batch Normalization

Ziquan Liu, Yufei Cui, Jia Wan, Yu Mao, Antoni B. Chan

PDF

Open Access

TL;DR

This paper introduces Weight Rescaling (WRS), a simple regularization method for deep neural networks with batch normalization, addressing weight decay issues by controlling weight norms to improve generalization and robustness across various vision tasks.

Contribution

The paper proposes WRS, a novel weight normalization scheme that outperforms traditional weight decay and other methods in terms of robustness and effectiveness.

Findings

01

WRS improves generalization across multiple vision tasks.

02

WRS is more robust to hyperparameter choices than weight decay.

03

WRS outperforms weight decay, weight standardization, and AdamP in experiments.

Abstract

Weight decay is often used to ensure good generalization in the training practice of deep neural networks with batch normalization (BN-DNNs), where some convolution layers are invariant to weight rescaling due to the normalization. In this paper, we demonstrate that the practical usage of weight decay still has some unsolved problems in spite of existing theoretical work on explaining the effect of weight decay in BN-DNNs. On the one hand, when the non-adaptive learning rate e.g. SGD with momentum is used, the effective learning rate continues to increase even after the initial training stage, which leads to an overfitting effect in many neural architectures. On the other hand, in both SGDM and adaptive learning rate optimizers e.g. Adam, the effect of weight decay on generalization is quite sensitive to the hyperparameter. Thus, finding an optimal weight decay parameter requires…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM

MethodsWeight Decay · Batch Normalization · Stochastic Gradient Descent · Convolution