Tight Risk Bounds for Gradient Descent on Separable Data

Matan Schliserman; Tomer Koren

arXiv:2303.01135·cs.LG·March 3, 2023·1 cites

Tight Risk Bounds for Gradient Descent on Separable Data

Matan Schliserman, Tomer Koren

PDF

Open Access 1 Video

TL;DR

This paper derives tight, general risk bounds for gradient descent on separable data with smooth loss functions, extending previous results and simplifying proofs, applicable to stochastic methods as well.

Contribution

It provides the first tight lower bounds and extends risk bounds to nearly all smooth loss functions with a simpler proof technique.

Findings

01

Upper bounds match previous best results

02

Lower bounds establish tightness of bounds

03

Results extend to stochastic gradient descent

Abstract

We study the generalization properties of unregularized gradient methods applied to separable linear classification -- a setting that has received considerable attention since the pioneering work of Soudry et al. (2018). We establish tight upper and lower (population) risk bounds for gradient descent in this setting, for any smooth loss function, expressed in terms of its tail decay rate. Our bounds take the form $Θ (r_{ℓ, T}^{2} / γ^{2} T + r_{ℓ, T}^{2} / γ^{2} n)$ , where $T$ is the number of gradient steps, $n$ is size of the training set, $γ$ is the data margin, and $r_{ℓ, T}$ is a complexity term that depends on the (tail decay rate) of the loss function (and on $T$ ). Our upper bound matches the best known upper bounds due to Shamir (2021); Schliserman and Koren (2022), while extending their applicability to virtually any smooth loss function and relaxing technical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Tight Risk Bounds for Gradient Descent on Separable Data· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Complexity and Algorithms in Graphs