Tight Risk Bounds for Gradient Descent on Separable Data
Matan Schliserman, Tomer Koren

TL;DR
This paper derives tight, general risk bounds for gradient descent on separable data with smooth loss functions, extending previous results and simplifying proofs, applicable to stochastic methods as well.
Contribution
It provides the first tight lower bounds and extends risk bounds to nearly all smooth loss functions with a simpler proof technique.
Findings
Upper bounds match previous best results
Lower bounds establish tightness of bounds
Results extend to stochastic gradient descent
Abstract
We study the generalization properties of unregularized gradient methods applied to separable linear classification -- a setting that has received considerable attention since the pioneering work of Soudry et al. (2018). We establish tight upper and lower (population) risk bounds for gradient descent in this setting, for any smooth loss function, expressed in terms of its tail decay rate. Our bounds take the form , where is the number of gradient steps, is size of the training set, is the data margin, and is a complexity term that depends on the (tail decay rate) of the loss function (and on ). Our upper bound matches the best known upper bounds due to Shamir (2021); Schliserman and Koren (2022), while extending their applicability to virtually any smooth loss function and relaxing technical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Complexity and Algorithms in Graphs
