Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization

Xinyu Luo; Cedar Site Bai; Bolian Li; Petros Drineas; Ruqi Zhang; Brian Bullins

arXiv:2506.06606·cs.LG·June 11, 2025

Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization

Xinyu Luo, Cedar Site Bai, Bolian Li, Petros Drineas, Ruqi Zhang, Brian Bullins

PDF

Open Access 1 Video

TL;DR

Stacey introduces an accelerated $\, ext{l}_p$-steepest descent algorithm tailored for non-Euclidean optimization in deep learning, offering theoretical guarantees and empirical improvements over traditional methods.

Contribution

The paper presents a novel accelerated $\, ext{l}_p$-steepest descent algorithm with primal-dual iterates, addressing non-Euclidean structures in deep network training.

Findings

01

Faster convergence compared to SGD, AdamW, and Lion.

02

Higher final accuracy in image classification and LLM pretraining.

03

Effectiveness of non-Euclidean approaches demonstrated across datasets.

Abstract

While popular optimization methods such as SGD, AdamW, and Lion depend on steepest descent updates in either $ℓ_{2}$ or $ℓ_{\infty}$ norms, there remains a critical gap in handling the non-Euclidean structure observed in modern deep networks training. In this work, we address this need by introducing a new accelerated $ℓ_{p}$ steepest descent algorithm, called Stacey, which uses interpolated primal-dual iterate sequences to effectively navigate non-Euclidean smooth optimization tasks. In addition to providing novel theoretical guarantees for the foundations of our algorithm, we empirically compare our approach against these popular methods on tasks including image classification and language model (LLM) pretraining, demonstrating both faster convergence and higher final accuracy. We further evaluate different values of $p$ across various models and datasets, underscoring the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Privacy-Preserving Technologies in Data