The Non-Linearity Perturbation Threshold: Width Scaling and Landscape Bifurcations in Deep Learning
Michael Alexander

TL;DR
This paper analyzes how the optimization landscape of neural networks deforms with activation homotopies, revealing bifurcation phenomena and their dependence on network width, with formal verification and practical implications.
Contribution
It introduces a theoretical framework for understanding landscape bifurcations in neural networks via Morse theory and Lyapunov-Schmidt reduction, verified in concrete architectures.
Findings
Bilinear overparameterization creates a Hessian kernel at the linear endpoint.
Activation homotopy softens the landscape floor, leading to explicit bifurcation points.
The bifurcation point scales with network width, connecting to the NTK regime.
Abstract
We study how the optimization landscape of a neural network deforms as a non-linear activation is introduced through a smooth homotopy. Working first in an abstract local setting - a smooth one-parameter family of objective functions together with a critical branch that loses non-degeneracy through a simple Hessian kernel - we show via Lyapunov-Schmidt reduction that the local transition is controlled by the classical codimension-one normal forms (transcritical or pitchfork) and that the associated topology change is governed by Morse-theoretic handle attachment. We then move beyond the abstract framework and verify these assumptions for a concrete two-layer architecture. We prove that bilinear overparameterization creates an (m-1)d-dimensional Hessian kernel at the linear endpoint, which Tikhonov regularization lifts to a floor alpha > 0; the activation homotopy softens this floor,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
