Robust Implicit Regularization via Weight Normalization
Hung-Hsu Chou, Holger Rauhut, Rachel Ward

TL;DR
This paper demonstrates that weight normalization in overparameterized models induces a robust implicit bias towards sparse solutions, improving convergence speed and robustness even with large initial weights.
Contribution
It introduces a theoretical analysis of weight normalization's implicit bias in gradient flow, showing robustness at large initializations, supported by experimental results.
Findings
Weight normalization induces a sparse implicit bias.
Robust bias persists at large initial weights.
Improved convergence speed and robustness in experiments.
Abstract
Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line of work has shown that (stochastic) gradient descent tends to have an implicit bias towards low rank and/or sparse solutions when used to train deep linear networks, explaining to some extent why overparameterized neural network models trained by gradient descent tend to have good generalization performance in practice. However, existing theory for square-loss objectives often requires very small initialization of the trainable weights, which is at odds with the larger scale at which weights are initialized in practice for faster convergence and better generalization performance. In this paper, we aim to close this gap by incorporating and analyzing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical methods in inverse problems · Sparse and Compressive Sensing Techniques · Topology Optimization in Engineering
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Weight Normalization
