All ERMs Can Fail in Stochastic Convex Optimization Lower Bounds in Linear Dimension

Tal Burla; Roi Livni

arXiv:2602.08350·cs.LG·February 10, 2026

All ERMs Can Fail in Stochastic Convex Optimization Lower Bounds in Linear Dimension

Tal Burla, Roi Livni

PDF

Open Access

TL;DR

This paper demonstrates that Empirical Risk Minimizers can fail in stochastic convex optimization by overfitting with linear sample complexity, and shows gradient descent can also overfit under certain conditions, providing new lower bounds.

Contribution

It constructs instances where ERMs overfit with linear sample complexity and extends the analysis to approximate ERMs, also establishing a new generalization lower bound for gradient descent.

Findings

01

ERM can overfit with linear sample size in stochastic convex optimization.

02

Gradient descent may overfit as horizon and learning rate grow, with a new lower bound.

03

The results narrow the gap between known upper and lower bounds for generalization in gradient descent.

Abstract

We study the sample complexity of the best-case Empirical Risk Minimizer in the setting of stochastic convex optimization. We show that there exists an instance in which the sample size is linear in the dimension, learning is possible, but the Empirical Risk Minimizer is likely to be unique and to overfit. This resolves an open question by Feldman. We also extend this to approximate ERMs. Building on our construction we also show that (constrained) Gradient Descent potentially overfits when horizon and learning rate grow w.r.t sample size. Specifically we provide a novel generalization lower bound of $Ω (η T / m^{1.5})$ for Gradient Descent, where $η$ is the learning rate, $T$ is the horizon and $m$ is the sample size. This narrows down, exponentially, the gap between the best known upper bound of $O (η T / m)$ and existing lower bounds from previous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Privacy-Preserving Technologies in Data