Streaming Krylov-Accelerated Stochastic Gradient Descent
Stephen Thomas

TL;DR
This paper introduces SKA-SGD, a new stochastic gradient descent method that uses Krylov subspace projections to accelerate convergence on ill-conditioned problems, with proven numerical stability and GPU efficiency.
Contribution
The paper extends s-step Krylov methods to stochastic optimization, providing a novel projection technique, stability analysis, and GPU implementation for faster convergence.
Findings
Achieves near machine precision backward error with O(s^2) complexity.
Outperforms standard SGD and Adam in convergence rate and accuracy.
Identifies GPU communication-avoidance benefits at moderate processor scales.
Abstract
We present SKA-SGD (Streaming Krylov-Accelerated Stochastic Gradient Descent), a novel optimization approach that accelerates convergence for ill-conditioned problems by projecting stochastic gradients onto a low-dimensional Krylov subspace. Directly inspired by recent advances in s-step Conjugate Gradient methods with streaming Gauss-Seidel Gram solvers \cite{dambra2025sstep}, our method extends these techniques to the stochastic optimization domain. Our approach combines three key innovations: (1) projection coefficients computed via a single streaming Gauss-Seidel iteration, which is mathematically equivalent to Modified Gram-Schmidt orthogonalization; (2) a Chebyshev polynomial basis for constructing the Krylov subspace, providing superior numerical stability; and (3) efficient implementation for AMD GPUs using HIP. We prove that our streaming approach achieves a backward error near…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Optimization Algorithms Research · Error Correcting Code Techniques
MethodsStochastic Gradient Descent · Adam
