Understanding the Implicit Regularization of Gradient Descent in Over-parameterized Models

Jianhao Ma; Geyu Liang; Salar Fattahi

arXiv:2505.17304·cs.LG·December 10, 2025

Understanding the Implicit Regularization of Gradient Descent in Over-parameterized Models

Jianhao Ma, Geyu Liang, Salar Fattahi

PDF

TL;DR

This paper investigates how gradient descent implicitly favors low-dimensional solutions in over-parameterized models, introducing IPGD to enhance this effect with theoretical guarantees and empirical validation.

Contribution

The paper identifies key conditions for implicit regularization in gradient descent and proposes IPGD, a novel method with theoretical and empirical support for over-parameterized problems.

Findings

01

IPGD satisfies conditions for implicit regularization under mild assumptions.

02

Theoretical guarantees are provided for over-parameterized matrix sensing.

03

Empirical results demonstrate broader applicability of IPGD.

Abstract

Implicit regularization refers to the tendency of local search algorithms to converge to low-dimensional solutions, even when such structures are not explicitly enforced. Despite its ubiquity, the mechanism underlying this behavior remains poorly understood, particularly in over-parameterized settings. We analyze gradient descent dynamics and identify three conditions under which it converges to second-order stationary points within an implicit low-dimensional region: (i) suitable initialization, (ii) efficient escape from saddle points, and (iii) sustained proximity to the region. We show that these can be achieved through infinitesimal perturbations and a small deviation rate. Building on this, we introduce Infinitesimally Perturbed Gradient Descent (IPGD), which satisfies these conditions under mild assumptions. We provide theoretical guarantees for IPGD in over-parameterized matrix…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.