Scale Normalization
Henry Z. Lo, Kevin Amaral, Wei Ding

TL;DR
This paper explores the importance of maintaining scale or isometry in deep neural networks beyond initialization, proposing methods that improve training speed by preserving scale during learning.
Contribution
It introduces two novel methods for maintaining isometry during training, demonstrating their effectiveness in accelerating learning.
Findings
Preserving scale speeds up training.
Isometry is crucial in early learning stages.
Maintaining isometry leads to faster convergence.
Abstract
One of the difficulties of training deep neural networks is caused by improper scaling between layers. Scaling issues introduce exploding / gradient problems, and have typically been addressed by careful scale-preserving initialization. We investigate the value of preserving scale, or isometry, beyond the initial weights. We propose two methods of maintaing isometry, one exact and one stochastic. Preliminary experiments show that for both determinant and scale-normalization effectively speeds up learning. Results suggest that isometry is important in the beginning of learning, and maintaining it leads to faster learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
