Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression

Jingfeng Wu; Peter Bartlett; Matus Telgarsky; Bin Yu

arXiv:2502.13283·cs.LG·July 1, 2025

Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression

Jingfeng Wu, Peter Bartlett, Matus Telgarsky, Bin Yu

PDF

Open Access 2 Videos

TL;DR

This paper demonstrates that early stopping in gradient descent for overparameterized logistic regression improves statistical properties and risk calibration, contrasting with the divergence and inconsistency of full convergence.

Contribution

It provides a theoretical analysis of how early stopping acts as an implicit regularizer, leading to better statistical performance than full convergence in high-dimensional logistic regression.

Findings

01

Early-stopped GD achieves vanishing excess logistic risk.

02

Asymptotic GD diverges and is statistically inconsistent.

03

Early-stopped GD requires polynomially many samples for good performance.

Abstract

In overparameterized logistic regression, gradient descent (GD) iterates diverge in norm while converging in direction to the maximum $ℓ_{2}$ -margin solution -- a phenomenon known as the implicit bias of GD. This work investigates additional regularization effects induced by early stopping in well-specified high-dimensional logistic regression. We first demonstrate that the excess logistic risk vanishes for early-stopped GD but diverges to infinity for GD iterates at convergence. This suggests that early-stopped GD is well-calibrated, whereas asymptotic GD is statistically inconsistent. Second, we show that to attain a small excess zero-one risk, polynomially many samples are sufficient for early-stopped GD, while exponentially many samples are necessary for any interpolating estimator, including asymptotic GD. This separation underscores the statistical benefits of early stopping in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression· youtube

Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression· slideslive

Taxonomy

TopicsFace and Expression Recognition

MethodsEarly Stopping