Implicit regularization of normalized gradient descent
C\'edric Josz

TL;DR
This paper investigates how normalized gradient descent with carefully chosen step sizes implicitly favors flat minima, using advanced nonsmooth analysis techniques to understand the underlying regularization effects.
Contribution
It introduces a novel perspective on implicit regularization through normalized gradient descent and applies variational analysis to explain its behavior.
Findings
Normalized gradient descent promotes flat minima under certain conditions
Implicit regularization depends on the existence of Lyapunov functions in gradient dynamics
Nonsmooth analysis provides a framework to understand regularization effects
Abstract
How to find flat minima? We propose running normalized gradient descent, usually reserved for nonsmooth optimization, with sufficiently slowly diminishing step sizes. This induces implicit regularization towards flat minima if an appropriate Lyapunov functions exists in the gradient dynamics. Our analysis shows that implicit regularization is intrinsically a question of nonsmooth analysis, for which we deploy the full power of variational analysis and stratification theory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Optimization Algorithms Research · Sparse and Compressive Sensing Techniques
