Subspace Optimization for Large Language Models with Convergence Guarantees

Yutong He; Pengrui Li; Yipeng Hu; Chuyan Chen; Kun Yuan

arXiv:2410.11289·cs.LG·June 5, 2025

Subspace Optimization for Large Language Models with Convergence Guarantees

Yutong He, Pengrui Li, Yipeng Hu, Chuyan Chen, Kun Yuan

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper analyzes the convergence properties of subspace optimization algorithms like GaLore for large language models, identifies their limitations, and introduces a new variant, GoLore, with proven convergence guarantees in stochastic settings.

Contribution

We reveal that GaLore does not always converge and propose GoLore, a new algorithm with proven convergence guarantees for stochastic large language model training.

Findings

01

GaLore can fail to converge in some cases.

02

Convergence of GaLore depends on batch size and gradient noise.

03

GoLore guarantees convergence even with standard batch sizes.

Abstract

Subspace optimization algorithms, such as GaLore (Zhao et al., 2024), have gained attention for pre-training and fine-tuning large language models (LLMs) due to their memory efficiency. However, their convergence guarantees remain unclear, particularly in stochastic settings. In this paper, we reveal that GaLore does not always converge to the optimal solution and provide an explicit counterexample to support this finding. We further explore the conditions under which GaLore achieves convergence, showing that it does so when either (i) a sufficiently large mini-batch size is used or (ii) the gradient noise is isotropic. More significantly, we introduce GoLore (Gradient random Low-rank projection), a novel variant of GaLore that provably converges in typical stochastic settings, even with standard batch sizes. Our convergence analysis extends naturally to other subspace optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pkumelon/golore
pytorchOfficial

Videos

Subspace Optimization for Large Language Models with Convergence Guarantees· slideslive

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis

MethodsSoftmax · Attention Is All You Need