Implicit Regularization and Convergence for Weight Normalization

Xiaoxia Wu; Edgar Dobriban; Tongzheng Ren; Shanshan Wu and; Zhiyuan Li; Suriya Gunasekar; Rachel Ward; Qiang Liu

arXiv:1911.07956·cs.LG·August 31, 2022·5 cites

Implicit Regularization and Convergence for Weight Normalization

Xiaoxia Wu, Edgar Dobriban, Tongzheng Ren, Shanshan Wu and, Zhiyuan Li, Suriya Gunasekar, Rachel Ward, Qiang Liu

PDF

Open Access 1 Video

TL;DR

This paper analyzes weight normalization and a variant called reparametrized projected gradient descent, showing they act as implicit regularizers that guide convergence towards minimum norm solutions in overparametrized least-squares regression, with less sensitivity to initialization.

Contribution

It demonstrates that weight normalization and rPGD implicitly regularize and converge near minimum norm solutions, differing from standard gradient descent behavior.

Findings

01

Weight normalization and rPGD regularize weights adaptively.

02

These methods converge close to minimum norm solutions.

03

They are less sensitive to initializations compared to gradient descent.

Abstract

Normalization methods such as batch [Ioffe and Szegedy, 2015], weight [Salimansand Kingma, 2016], instance [Ulyanov et al., 2016], and layer normalization [Baet al., 2016] have been widely used in modern machine learning. Here, we study the weight normalization (WN) method [Salimans and Kingma, 2016] and a variant called reparametrized projected gradient descent (rPGD) for overparametrized least-squares regression. WN and rPGD reparametrize the weights with a scale g and a unit vector w and thus the objective function becomes non-convex. We show that this non-convex formulation has beneficial regularization effects compared to gradient descent on the original objective. These methods adaptively regularize the weights and converge close to the minimum l2 norm solution, even for initializations far from zero. For certain stepsizes of g and w , we show that they can converge close to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Implicit Regularization and Convergence for Weight Normalization· slideslive

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Face and Expression Recognition

MethodsLayer Normalization · Weight Normalization · Batch Normalization