Does Weight Decay Enhance Training Stability?
Marius Saether, Amir Kolic, Tomaso Poggio, Pierfrancesco Beneventano

TL;DR
This paper investigates how weight decay influences training stability in deep learning, revealing architecture-dependent phase transitions and mechanisms that affect parameter dynamics and loss sharpness.
Contribution
It uncovers the effects of weight decay on the Edge of Stability, models the phenomena mathematically, and challenges traditional stability diagnostics based on curvature thresholds.
Findings
Weight decay slows progressive sharpening of the loss landscape.
In CNNs, weight decay dampens oscillations at the Edge of Stability.
In MLPs, increasing weight decay causes a phase transition in sharpness stabilization.
Abstract
In modern deep learning, weight decay is often credited with "stabilizing" training dynamics, diverging from its classical role as a static regularization penalty. We investigate a fundamental question: *does weight decay stabilize training dynamics, and if so, through which mechanism?* Indeed, training stability is understood through different but related notions in the literature. We consider how weight decay affects the parameter-space dynamics and loss sharpness by analyzing its effects at the \emph{Edge of Stability} (EoS). We show that weight decay robustly slows *progressive sharpening}. Furthermore, we uncover a striking architecture-dependent phase transition. In CNNs, weight decay dampens the oscillations at the EoS, while in MLPs, increasing weight decay causes a phase transition in which the sharpness stabilizes at a threshold significantly below the theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
