Memory-Efficient LLM Training with Online Subspace Descent

Kaizhao Liang; Bo Liu; Lizhang Chen; Qiang Liu

arXiv:2408.12857·cs.LG·August 26, 2024

Memory-Efficient LLM Training with Online Subspace Descent

Kaizhao Liang, Bo Liu, Lizhang Chen, Qiang Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Online Subspace Descent, a memory-efficient optimizer with convergence guarantees, that improves large language model training by replacing SVD with online PCA, leading to better performance and lower perplexity.

Contribution

The paper provides the first convergence guarantee for arbitrary update rules in low-rank gradient methods and proposes a novel optimizer using online PCA for efficient LLM training.

Findings

01

Online Subspace Descent outperforms existing low-rank methods in LLaMA pretraining.

02

Achieves lower perplexity and better downstream task performance.

03

Narrower gap with full-rank training methods.

Abstract

Recently, a wide range of memory-efficient LLM training algorithms have gained substantial popularity. These methods leverage the low-rank structure of gradients to project optimizer states into a subspace using projection matrix found by singular value decomposition (SVD). However, convergence of these algorithms is highly dependent on the update rules of their projection matrix. In this work, we provide the \emph{first} convergence guarantee for arbitrary update rules of projection matrix. This guarantee is generally applicable to optimizers that can be analyzed with Hamiltonian Descent, including most common ones, such as LION, Adam. Inspired by our theoretical understanding, we propose Online Subspace Descent, a new family of subspace descent optimizer without SVD. Instead of updating the projection matrix with eigenvectors, Online Subspace Descent updates the projection matrix with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kyleliang919/online-subspace-descent
pytorchOfficial

Videos

Memory-Efficient LLM Training with Online Subspace Descent· slideslive

Taxonomy

TopicsDigital Rights Management and Security

MethodsLLaMA · Principal Components Analysis · Adam