Convergence of Gradient Descent on Separable Data
Mor Shpigel Nacson, Jason D. Lee, Suriya Gunasekar, Pedro H. P., Savarese, Nathan Srebro, Daniel Soudry

TL;DR
This paper investigates how gradient descent on separable data converges to maximum-margin solutions depending on the loss function's tail behavior, revealing conditions for convergence and optimal rates.
Contribution
It characterizes the conditions under which gradient descent converges to the maximum-margin separator for various loss tails and proposes improved convergence rates with aggressive step sizes.
Findings
Gradient descent converges to the maximum-margin solution for super-polynomial tailed losses.
Exponential tailed losses like logistic loss achieve optimal convergence rates.
Aggressive step sizes can improve convergence rates to clog(t)/\u221asqrt{t} for linear models.
Abstract
We provide a detailed study on the implicit bias of gradient descent when optimizing loss functions with strictly monotone tails, such as the logistic loss, over separable datasets. We look at two basic questions: (a) what are the conditions on the tail of the loss function under which gradient descent converges in the direction of the maximum-margin separator? (b) how does the rate of margin convergence depend on the tail of the loss function and the choice of the step size? We show that for a large family of super-polynomial tailed losses, gradient descent iterates on linear networks of any depth converge in the direction of maximum-margin solution, while this does not hold for losses with heavier tails. Within this family, for simple linear models we show that the optimal rates with fixed step size is indeed obtained for the commonly used exponentially tailed losses such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
