AdaFRUGAL: Adaptive Memory-Efficient Training with Dynamic Control
Quang-Hung Bui, Anh Son Ta

TL;DR
AdaFRUGAL introduces dynamic control mechanisms to optimize memory and computational efficiency during large language model training, reducing resource usage while maintaining performance.
Contribution
It automates hyperparameter tuning in the FRUGAL framework with dynamic controls, enhancing adaptability and efficiency in LLM training.
Findings
Achieves significant GPU memory reduction
Reduces training time compared to static methods
Maintains competitive performance on large-scale tasks
Abstract
Training Large Language Models (LLMs) is highly memory-intensive due to optimizer state overhead. The FRUGAL framework mitigates this with gradient splitting, but its static hyperparameters -- the subspace ratio () and update frequency () -- require costly manual tuning, limiting adaptability. We present AdaFRUGAL, which automates this process by introducing two dynamic controls: (i) a linear decay for to progressively reduce memory, and (ii) a loss-aware schedule for to lower computational overhead. Experiments across large-scale pre-training (English C4, Vietnamese VietVault) and fine-tuning (GLUE) demonstrate that AdaFRUGAL achieves a compelling trade-off. It maintains competitive performance against AdamW and static FRUGAL while significantly reducing both GPU memory and training time, offering a more practical, autonomous solution for resource-constrained LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
