Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression
Jingfeng Wu, Peter Bartlett, Matus Telgarsky, Bin Yu

TL;DR
This paper demonstrates that early stopping in gradient descent for overparameterized logistic regression improves statistical properties and risk calibration, contrasting with the divergence and inconsistency of full convergence.
Contribution
It provides a theoretical analysis of how early stopping acts as an implicit regularizer, leading to better statistical performance than full convergence in high-dimensional logistic regression.
Findings
Early-stopped GD achieves vanishing excess logistic risk.
Asymptotic GD diverges and is statistically inconsistent.
Early-stopped GD requires polynomially many samples for good performance.
Abstract
In overparameterized logistic regression, gradient descent (GD) iterates diverge in norm while converging in direction to the maximum -margin solution -- a phenomenon known as the implicit bias of GD. This work investigates additional regularization effects induced by early stopping in well-specified high-dimensional logistic regression. We first demonstrate that the excess logistic risk vanishes for early-stopped GD but diverges to infinity for GD iterates at convergence. This suggests that early-stopped GD is well-calibrated, whereas asymptotic GD is statistically inconsistent. Second, we show that to attain a small excess zero-one risk, polynomially many samples are sufficient for early-stopped GD, while exponentially many samples are necessary for any interpolating estimator, including asymptotic GD. This separation underscores the statistical benefits of early stopping in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsFace and Expression Recognition
MethodsEarly Stopping
