Gradient Descent on Logistic Regression: Do Large Step-Sizes Work with Data on the Sphere?
Si Yi Meng, Baptiste Goujaud, Antonio Orvieto, Christopher De Sa

TL;DR
This paper investigates the convergence behavior of gradient descent on logistic regression when data points are on a sphere, showing that large step sizes can still cause cycling in higher dimensions.
Contribution
It proves that equal-magnitude data ensures convergence in one dimension but not in higher dimensions, highlighting the complexity of large step sizes.
Findings
Convergence in 1D with equal-magnitude data
Cycling behavior can occur in higher dimensions
Large step sizes do not guarantee convergence in all cases
Abstract
Gradient descent (GD) on logistic regression has many fascinating properties. When the dataset is linearly separable, it is known that the iterates converge in direction to the maximum-margin separator regardless of how large the step size is. In the non-separable case, however, it has been shown that GD can exhibit a cycling behaviour even when the step sizes is still below the stability threshold , where is the largest eigenvalue of the Hessian at the solution. This short paper explores whether restricting the data to have equal magnitude is a sufficient condition for global convergence, under any step size below the stability threshold. We prove that this is true in a one dimensional space, but in higher dimensions cycling behaviour can still occur. We hope to inspire further studies on quantifying how common these cycles are in realistic datasets, as well as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference
