AdaFRUGAL: Adaptive Memory-Efficient Training with Dynamic Control

Quang-Hung Bui; Anh Son Ta

arXiv:2601.11568·cs.LG·April 30, 2026

AdaFRUGAL: Adaptive Memory-Efficient Training with Dynamic Control

Quang-Hung Bui, Anh Son Ta

PDF

TL;DR

AdaFRUGAL introduces dynamic control mechanisms to optimize memory and computational efficiency during large language model training, reducing resource usage while maintaining performance.

Contribution

It automates hyperparameter tuning in the FRUGAL framework with dynamic controls, enhancing adaptability and efficiency in LLM training.

Findings

01

Achieves significant GPU memory reduction

02

Reduces training time compared to static methods

03

Maintains competitive performance on large-scale tasks

Abstract

Training Large Language Models (LLMs) is highly memory-intensive due to optimizer state overhead. The FRUGAL framework mitigates this with gradient splitting, but its static hyperparameters -- the subspace ratio ( $ρ$ ) and update frequency ( $T$ ) -- require costly manual tuning, limiting adaptability. We present AdaFRUGAL, which automates this process by introducing two dynamic controls: (i) a linear decay for $ρ$ to progressively reduce memory, and (ii) a loss-aware schedule for $T$ to lower computational overhead. Experiments across large-scale pre-training (English C4, Vietnamese VietVault) and fine-tuning (GLUE) demonstrate that AdaFRUGAL achieves a compelling trade-off. It maintains competitive performance against AdamW and static FRUGAL while significantly reducing both GPU memory and training time, offering a more practical, autonomous solution for resource-constrained LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.