PowerStep: Memory-Efficient Adaptive Optimization via $\ell_p$-Norm Steepest Descent

Yao Lu; Dengdong Fan; Shixun Zhang; Yonghong Tian

arXiv:2605.10335·cs.LG·May 12, 2026

PowerStep: Memory-Efficient Adaptive Optimization via $\ell_p$-Norm Steepest Descent

Yao Lu, Dengdong Fan, Shixun Zhang, Yonghong Tian

PDF

1 Repo

TL;DR

PowerStep is a memory-efficient adaptive optimizer inspired by $\, ext{ extlbrackdbl}p ext{ extbrackdbl}$-norm steepest descent, matching Adam's convergence while halving memory use, suitable for large-scale neural network training.

Contribution

Introduces PowerStep, a novel optimizer that reduces memory overhead by avoiding second-moment storage, with proven convergence and practical effectiveness on large models.

Findings

01

PowerStep matches Adam's convergence speed.

02

Halves optimizer memory compared to Adam.

03

Remains stable with int8 quantization and large models.

Abstract

Adaptive optimizers, most notably Adam, have become the default standard for training large-scale neural networks such as Transformers. These methods maintain running estimates of gradient first and second moments, incurring substantial memory overhead. We introduce PowerStep, a memory-efficient optimizer that achieves coordinate-wise adaptivity without storing second-moment statistics. Motivated by steepest descent under an $ℓ_{p}$ -norm geometry, we show that applying a nonlinear transform directly to a momentum buffer yields coordinate-wise adaptivity. We prove that PowerStep converges at the optimal $O (1/ T)$ rate for non-convex stochastic optimization. Extensive experiments on Transformer models ranging from 124M to 235B parameters demonstrate that PowerStep matches Adam's convergence speed while halving optimizer memory. Furthermore, when combined with aggressive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yaolubrain/PowerStep
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.