Conservation Law Breaking at the Edge of Stability: A Spectral Theory of Non-Convex Neural Network Optimization

Daniel Nobrega Medeiros

arXiv:2604.07405·cs.LG·April 10, 2026

Conservation Law Breaking at the Edge of Stability: A Spectral Theory of Non-Convex Neural Network Optimization

Daniel Nobrega Medeiros

PDF

TL;DR

This paper develops a spectral theory explaining why gradient descent reliably finds good solutions in non-convex neural network training by analyzing conservation laws, spectral properties, and dynamical regimes.

Contribution

It introduces a spectral framework and conservation laws that elucidate the dynamics of gradient descent in non-convex neural networks, validated through extensive experiments.

Findings

01

Gradient flow preserves L-1 conservation laws confining trajectories.

02

Drift in gradient descent scales with learning rate and spectral properties.

03

Exponential spectral compression explains self-regularization in cross-entropy loss.

Abstract

Why does gradient descent reliably find good solutions in non-convex neural network optimization, despite the landscape being NP-hard in the worst case? We show that gradient flow on L-layer ReLU networks without bias preserves L-1 conservation laws C_l = ||W_{l+1}||_F^2 - ||W_l||_F^2, confining trajectories to lower-dimensional manifolds. Under discrete gradient descent, these laws break with total drift scaling as eta^alpha where alpha is approximately 1.1-1.6 depending on architecture, loss function, and width. We decompose this drift exactly as eta^2 * S(eta), where the gradient imbalance sum S(eta) admits a closed-form spectral crossover formula with mode coefficients c_k proportional to e_k(0)^2 * lambda_{x,k}^2, derived from first principles and validated for both linear (R=0.85) and ReLU (R>0.80) networks. For cross-entropy loss, softmax probability concentration drives…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.