Optimization-Induced Dynamics of Lipschitz Continuity in Neural Networks
R\'ois\'in Luo, James McDermott, Christian Gagn\'e, Qiang Sun, Colm O'Riordan

TL;DR
This paper develops a mathematical framework using stochastic differential equations to analyze how Lipschitz continuity in neural networks evolves during training, influenced by optimization dynamics and stochastic noise.
Contribution
It introduces a novel SDE-based model capturing the dynamics of Lipschitz continuity during training, highlighting key factors affecting its evolution.
Findings
The framework accurately predicts Lipschitz evolution during training.
Gradient projection onto Jacobian influences Lipschitz changes.
Gradient noise impacts Lipschitz continuity through Hessian projections.
Abstract
Lipschitz continuity characterizes the worst-case sensitivity of neural networks to small input perturbations; yet its dynamics (i.e. temporal evolution) during training remains under-explored. We present a rigorous mathematical framework to model the temporal evolution of Lipschitz continuity during training with stochastic gradient descent (SGD). This framework leverages a system of stochastic differential equations (SDEs) to capture both deterministic and stochastic forces. Our theoretical analysis identifies three principal factors driving the evolution: (i) the projection of gradient flows, induced by the optimization dynamics, onto the operator-norm Jacobian of parameter matrices; (ii) the projection of gradient noise, arising from the randomness in mini-batch sampling, onto the operator-norm Jacobian; and (iii) the projection of the gradient noise onto the operator-norm Hessian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
