A Memory Efficient Randomized Subspace Optimization Method for Training   Large Language Models

Yiming Chen; Yuan Zhang; Yin Liu; Kun Yuan; Zaiwen Wen

arXiv:2502.07222·cs.LG·February 12, 2025

A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models

Yiming Chen, Yuan Zhang, Yin Liu, Kun Yuan, Zaiwen Wen

PDF

Open Access 1 Video

TL;DR

This paper presents a novel randomized subspace optimization method that significantly reduces memory usage for training large language models, addressing both optimizer states and activations, with proven convergence guarantees and competitive performance.

Contribution

It introduces a new framework that decomposes high-dimensional training into lower-dimensional subproblems, improving memory efficiency and providing theoretical convergence analysis.

Findings

01

Reduces memory footprint for activations and optimizer states.

02

Achieves comparable performance to existing methods like GaLore and Adam.

03

Demonstrates superior memory and communication efficiency in experiments.

Abstract

The memory challenges associated with training Large Language Models (LLMs) have become a critical concern, particularly when using the Adam optimizer. To address this issue, numerous memory-efficient techniques have been proposed, with GaLore standing out as a notable example designed to reduce the memory footprint of optimizer states. However, these approaches do not alleviate the memory burden imposed by activations, rendering them unsuitable for scenarios involving long context sequences or large mini-batches. Moreover, their convergence properties are still not well-understood in the literature. In this work, we introduce a Randomized Subspace Optimization framework for pre-training and fine-tuning LLMs. Our approach decomposes the high-dimensional training problem into a series of lower-dimensional subproblems. At each iteration, a random subspace is selected, and the parameters…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models· slideslive

Taxonomy

TopicsText and Document Classification Technologies

MethodsAdam