Gradient Descent on Logistic Regression: Do Large Step-Sizes Work with Data on the Sphere?

Si Yi Meng; Baptiste Goujaud; Antonio Orvieto; Christopher De Sa

arXiv:2507.11228·cs.LG·July 16, 2025

Gradient Descent on Logistic Regression: Do Large Step-Sizes Work with Data on the Sphere?

Si Yi Meng, Baptiste Goujaud, Antonio Orvieto, Christopher De Sa

PDF

Open Access

TL;DR

This paper investigates the convergence behavior of gradient descent on logistic regression when data points are on a sphere, showing that large step sizes can still cause cycling in higher dimensions.

Contribution

It proves that equal-magnitude data ensures convergence in one dimension but not in higher dimensions, highlighting the complexity of large step sizes.

Findings

01

Convergence in 1D with equal-magnitude data

02

Cycling behavior can occur in higher dimensions

03

Large step sizes do not guarantee convergence in all cases

Abstract

Gradient descent (GD) on logistic regression has many fascinating properties. When the dataset is linearly separable, it is known that the iterates converge in direction to the maximum-margin separator regardless of how large the step size is. In the non-separable case, however, it has been shown that GD can exhibit a cycling behaviour even when the step sizes is still below the stability threshold $2/ λ$ , where $λ$ is the largest eigenvalue of the Hessian at the solution. This short paper explores whether restricting the data to have equal magnitude is a sufficient condition for global convergence, under any step size below the stability threshold. We prove that this is true in a one dimensional space, but in higher dimensions cycling behaviour can still occur. We hope to inspire further studies on quantifying how common these cycles are in realistic datasets, as well as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference