Any-stepsize Gradient Descent for Separable Data under Fenchel-Young Losses

Han Bao; Shinsaku Sakaue; Yuki Takezawa

arXiv:2502.04889·stat.ML·October 14, 2025

Any-stepsize Gradient Descent for Separable Data under Fenchel-Young Losses

Han Bao, Shinsaku Sakaue, Yuki Takezawa

PDF

Open Access 1 Video

TL;DR

This paper investigates the convergence of gradient descent with arbitrary stepsizes on separable data using Fenchel-Young losses, revealing the importance of separation margin over self-bounding properties for convergence rates.

Contribution

It extends understanding of GD convergence beyond self-bounding losses by establishing arbitrary-stepsize convergence for Fenchel-Young losses, highlighting the role of separation margin.

Findings

01

Tsallis entropy achieves a convergence rate of Ω(ε^{-1/2})

02

Rényi entropy achieves a convergence rate of Ω(ε^{-1/3})

03

Separation margin, not self-bounding property, influences convergence rates

Abstract

The gradient descent (GD) has been one of the most common optimizer in machine learning. In particular, the loss landscape of a neural network is typically sharpened during the initial phase of training, making the training dynamics hover on the edge of stability. This is beyond our standard understanding of GD convergence in the stable regime where arbitrarily chosen stepsize is sufficiently smaller than the edge of stability. Recently, Wu et al. (COLT2024) have showed that GD converges with arbitrary stepsize under linearly separable logistic regression. Although their analysis hinges on the self-bounding property of the logistic loss, which seems to be a cornerstone to establish a modified descent lemma, our pilot study shows that other loss functions without the self-bounding property can make GD converge with arbitrary stepsize. To further understand what property of a loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Any-stepsize Gradient Descent for Separable Data under Fenchel–Young Losses· slideslive

Taxonomy

TopicsStatistical Methods and Inference · Reservoir Engineering and Simulation Methods · Stochastic Gradient Optimization Techniques