Optimization and Generalization Guarantees for Weight Normalization
Pedro Cisneros-Velarde, Zhijie Chen, Sanmi Koyejo, Arindam Banerjee

TL;DR
This paper provides the first theoretical analysis of optimization and generalization for deep neural networks with weight normalization, establishing convergence guarantees and generalization bounds.
Contribution
It offers the first theoretical characterizations of optimization and generalization for WeightNorm models, including spectral norm bounds and convergence guarantees.
Findings
Spectral norm of Hessian depends on network width and normalization terms.
Training convergence guaranteed under certain conditions.
Generalization bound is independent of width and sublinear in depth.
Abstract
Weight normalization (WeightNorm) is widely used in practice for the training of deep neural networks and modern deep learning libraries have built-in implementations of it. In this paper, we provide the first theoretical characterizations of both optimization and generalization of deep WeightNorm models with smooth activation functions. For optimization, from the form of the Hessian of the loss, we note that a small Hessian of the predictor leads to a tractable analysis. Thus, we bound the spectral norm of the Hessian of WeightNorm networks and show its dependence on the network width and weight normalization terms--the latter being unique to networks without WeightNorm. Then, we use this bound to establish training convergence guarantees under suitable assumptions for gradient decent. For generalization, we use WeightNorm to get a uniform convergence based generalization bound, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Numerical Analysis Techniques · Advanced Data Compression Techniques · Transport Systems and Technology
MethodsWeight Normalization
