Learning threshold neurons via the "edge of stability"
Kwangjun Ahn, S\'ebastien Bubeck, Sinho Chewi, Yin Tat Lee, Felipe, Suarez, Yi Zhang

TL;DR
This paper investigates the 'edge of stability' phenomenon in neural network training with large learning rates, revealing a phase transition affecting the learning of threshold neurons and its implications for generalization.
Contribution
It provides a theoretical analysis of the edge of stability in simplified neural models, showing a phase transition that influences threshold neuron learning and generalization.
Findings
Established the edge of stability phenomenon in simplified models.
Discovered a phase transition for step size affecting threshold neuron learning.
Suggested a link between the edge of stability and improved generalization.
Abstract
Existing analyses of neural network training often operate under the unrealistic assumption of an extremely small learning rate. This lies in stark contrast to practical wisdom and empirical studies, such as the work of J. Cohen et al. (ICLR 2021), which exhibit startling new phenomena (the "edge of stability" or "unstable convergence") and potential benefits for generalization in the large learning rate regime. Despite a flurry of recent works on this topic, however, the latter effect is still poorly understood. In this paper, we take a step towards understanding genuinely non-convex training dynamics with large learning rates by performing a detailed analysis of gradient descent for simplified models of two-layer neural networks. For these models, we provably establish the edge of stability phenomenon and discover a sharp phase transition for the step size below which the neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques · Advanced Thermodynamics and Statistical Mechanics
