Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes
Maxim Kodryan, Ekaterina Lobacheva, Maksim Nakhodnov, Dmitry Vetrov

TL;DR
This paper explores the training dynamics of scale-invariant neural networks on the sphere, identifying three distinct regimes—convergence, chaotic equilibrium, and divergence—based on the effective learning rate, and demonstrates their implications for optimization.
Contribution
It provides a theoretical and empirical analysis of training scale-invariant networks on the sphere with fixed ELR, revealing three regimes and their impact on loss landscape understanding.
Findings
Three training regimes identified: convergence, chaotic equilibrium, divergence.
Distinct features of each regime relate to properties of the intrinsic loss landscape.
Insights enable improved optimization strategies for normalized networks.
Abstract
A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. However, the varying ELR may obscure certain characteristics of the intrinsic loss landscape structure. In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence. We study these regimes in detail both on a theoretical examination of a toy example and on a thorough empirical analysis of real scale-invariant deep learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Generative Adversarial Networks and Image Synthesis
MethodsEarly Learning Regularization
