Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes
Si Yi Meng, Antonio Orvieto, Daniel Yiming Cao, Christopher De Sa

TL;DR
This paper investigates how gradient descent behaves on logistic regression problems with non-separable data when using large step sizes, revealing complex dynamics including convergence to cycles and the importance of step size choices.
Contribution
The study characterizes the global convergence properties of gradient descent on logistic regression with large step sizes, highlighting the existence of stable cycles and the limitations of local convergence guarantees.
Findings
For one-dimensional data, step size less than 1/λ ensures global convergence.
In higher dimensions, GD can converge to cycles even with step sizes less than 1/λ.
Global convergence is not guaranteed for step sizes between 1/λ and 2/λ, depending on data and initialization.
Abstract
We study gradient descent (GD) dynamics on logistic regression problems with large, constant step sizes. For linearly-separable data, it is known that GD converges to the minimizer with arbitrarily large step sizes, a property which no longer holds when the problem is not separable. In fact, the behaviour can be much more complex -- a sequence of period-doubling bifurcations begins at the critical step size , where is the largest eigenvalue of the Hessian at the solution. Using a smaller-than-critical step size guarantees convergence if initialized nearby the solution: but does this suffice globally? In one dimension, we show that a step size less than suffices for global convergence. However, for all step sizes between and the critical step size , one can construct a dataset such that GD converges to a stable cycle. In higher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Face and Expression Recognition
MethodsLogistic Regression
