Phase Transitions between Accuracy Regimes in L2 regularized Deep Neural Networks

Ibrahim Talha Ersoy; Karoline Wiesner

arXiv:2505.06597·cs.LG·August 29, 2025

Phase Transitions between Accuracy Regimes in L2 regularized Deep Neural Networks

Ibrahim Talha Ersoy, Karoline Wiesner

PDF

TL;DR

This paper investigates phase transitions in L2 regularized deep neural networks, linking them to the error landscape's geometry, and explains phenomena like grokking through these transitions.

Contribution

It introduces a geometric perspective on phase transitions in DNNs, predicts new transition points, and explains grokking as a local minimum phenomenon.

Findings

01

Identification of phase transitions via Ricci curvature

02

Prediction of new transition points with increased data complexity

03

Explanation of grokking as a local minimum in error landscape

Abstract

Increasing the L2 regularization of Deep Neural Networks (DNNs) causes a first-order phase transition into the under-parametrized phase -- the so-called onset-of learning. We explain this transition via the scalar (Ricci) curvature of the error landscape. We predict new transition points as the data complexity is increased and, in accordance with the theory of phase transitions, the existence of hysteresis effects. We confirm both predictions numerically. Our results provide a natural explanation of the recently discovered phenomenon of '\emph{grokking}' as DNN models getting stuck in a local minimum of the error surface, corresponding to a lower accuracy phase. Our work paves the way for new probing methods of the intrinsic structure of DNNs in and beyond the L2 context.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training