FOAM: Blocked State Folding for Memory-Efficient LLM Training

Ziqing Wen; Jiahuan Wang; Ping Luo; Dongsheng Li; Tao Sun

arXiv:2512.07112·cs.LG·May 14, 2026

FOAM: Blocked State Folding for Memory-Efficient LLM Training

Ziqing Wen, Jiahuan Wang, Ping Luo, Dongsheng Li, Tao Sun

PDF

1 Repo

TL;DR

FOAM is a memory-efficient optimizer for training large language models that significantly reduces memory usage while maintaining convergence and performance.

Contribution

The paper introduces FOAM, a novel optimizer that compresses optimizer states with block-wise gradient means and residual correction, achieving memory savings without performance loss.

Findings

01

Eliminates up to 90% of optimizer memory overhead.

02

Accelerates convergence compared to standard Adam.

03

Compatible with other memory-efficient optimizers, matching or surpassing their performance.

Abstract

Large language models (LLMs) have demonstrated remarkable performance due to their large parameter counts and extensive training data. However, their scale leads to significant memory bottlenecks during training, especially when using memory-intensive optimizers like Adam. Existing memory-efficient approaches often rely on techniques such as singular value decomposition (SVD), projections, or weight freezing, which can introduce substantial computational overhead, require additional memory for projections, or degrade model performance. In this paper, we propose Folded Optimizer with Approximate Moment (FOAM), a method that compresses optimizer states by computing block-wise gradient means and incorporates a residual correction to recover lost information. Theoretically, FOAM achieves convergence rates equivalent to vanilla Adam under standard non-convex optimization settings.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zqOuO/FOAM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.