On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD

Tongcheng Zhang; Zhanpeng Zhou; Mingze Wang; Andi Han; Wei Huang; Taiji Suzuki; Junchi Yan

arXiv:2603.10397·cs.LG·March 12, 2026

On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD

Tongcheng Zhang, Zhanpeng Zhou, Mingze Wang, Andi Han, Wei Huang, Taiji Suzuki, Junchi Yan

PDF

Open Access 1 Video

TL;DR

This paper analyzes the learning dynamics of label noise SGD in two-layer linear networks, revealing a two-phase process where weights first diminish then align with the ground truth, explaining the beneficial role of label noise.

Contribution

It provides a theoretical analysis of how label noise influences learning dynamics in over-parameterized linear networks, extending insights to broader optimization methods like SAM.

Findings

01

Two-phase learning behavior identified: weight decay followed by alignment.

02

Label noise drives the transition from lazy to rich regime.

03

Experimental results support the theoretical analysis.

Abstract

One crucial factor behind the success of deep learning lies in the implicit bias induced by noise inherent in gradient-based training algorithms. Motivated by empirical observations that training with noisy labels improves model generalization, we delve into the underlying mechanisms behind stochastic gradient descent (SGD) with label noise. Focusing on a two-layer over-parameterized linear network, we analyze the learning dynamics of label noise SGD, unveiling a two-phase learning behavior. In \emph{Phase I}, the magnitudes of model weights progressively diminish, and the model escapes the lazy regime; enters the rich regime. In \emph{Phase II}, the alignment between model weights and the ground-truth interpolator increases, and the model eventually converges. Our analysis highlights the critical role of label noise in driving the transition from the lazy to the rich regime and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD· underline

Taxonomy

TopicsMachine Learning and Data Classification · Text and Document Classification Technologies · Machine Learning and Algorithms