Training Scale-Invariant Neural Networks on the Sphere Can Happen in   Three Regimes

Maxim Kodryan; Ekaterina Lobacheva; Maksim Nakhodnov; Dmitry Vetrov

arXiv:2209.03695·cs.LG·January 18, 2023·1 cites

Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes

Maxim Kodryan, Ekaterina Lobacheva, Maksim Nakhodnov, Dmitry Vetrov

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper explores the training dynamics of scale-invariant neural networks on the sphere, identifying three distinct regimes—convergence, chaotic equilibrium, and divergence—based on the effective learning rate, and demonstrates their implications for optimization.

Contribution

It provides a theoretical and empirical analysis of training scale-invariant networks on the sphere with fixed ELR, revealing three regimes and their impact on loss landscape understanding.

Findings

01

Three training regimes identified: convergence, chaotic equilibrium, divergence.

02

Distinct features of each regime relate to properties of the intrinsic loss landscape.

03

Insights enable improved optimization strategies for normalized networks.

Abstract

A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. However, the varying ELR may obscure certain characteristics of the intrinsic loss landscape structure. In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence. We study these regimes in detail both on a theoretical examination of a toy example and on a thorough empirical analysis of real scale-invariant deep learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tipt0p/three_regimes_on_the_sphere
pytorchOfficial

Videos

Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes· slideslive

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Generative Adversarial Networks and Image Synthesis

MethodsEarly Learning Regularization