All ERMs Can Fail in Stochastic Convex Optimization Lower Bounds in Linear Dimension
Tal Burla, Roi Livni

TL;DR
This paper demonstrates that Empirical Risk Minimizers can fail in stochastic convex optimization by overfitting with linear sample complexity, and shows gradient descent can also overfit under certain conditions, providing new lower bounds.
Contribution
It constructs instances where ERMs overfit with linear sample complexity and extends the analysis to approximate ERMs, also establishing a new generalization lower bound for gradient descent.
Findings
ERM can overfit with linear sample size in stochastic convex optimization.
Gradient descent may overfit as horizon and learning rate grow, with a new lower bound.
The results narrow the gap between known upper and lower bounds for generalization in gradient descent.
Abstract
We study the sample complexity of the best-case Empirical Risk Minimizer in the setting of stochastic convex optimization. We show that there exists an instance in which the sample size is linear in the dimension, learning is possible, but the Empirical Risk Minimizer is likely to be unique and to overfit. This resolves an open question by Feldman. We also extend this to approximate ERMs. Building on our construction we also show that (constrained) Gradient Descent potentially overfits when horizon and learning rate grow w.r.t sample size. Specifically we provide a novel generalization lower bound of for Gradient Descent, where is the learning rate, is the horizon and is the sample size. This narrows down, exponentially, the gap between the best known upper bound of and existing lower bounds from previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Privacy-Preserving Technologies in Data
