Implicit regularization of normalized gradient descent

C\'edric Josz

arXiv:2602.08177·math.OC·February 10, 2026

Implicit regularization of normalized gradient descent

C\'edric Josz

PDF

Open Access

TL;DR

This paper investigates how normalized gradient descent with carefully chosen step sizes implicitly favors flat minima, using advanced nonsmooth analysis techniques to understand the underlying regularization effects.

Contribution

It introduces a novel perspective on implicit regularization through normalized gradient descent and applies variational analysis to explain its behavior.

Findings

01

Normalized gradient descent promotes flat minima under certain conditions

02

Implicit regularization depends on the existence of Lyapunov functions in gradient dynamics

03

Nonsmooth analysis provides a framework to understand regularization effects

Abstract

How to find flat minima? We propose running normalized gradient descent, usually reserved for nonsmooth optimization, with sufficiently slowly diminishing step sizes. This induces implicit regularization towards flat minima if an appropriate Lyapunov functions exists in the gradient dynamics. Our analysis shows that implicit regularization is intrinsically a question of nonsmooth analysis, for which we deploy the full power of variational analysis and stratification theory.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Optimization Algorithms Research · Sparse and Compressive Sensing Techniques