SubTrack++ : Gradient Subspace Tracking for Scalable LLM Training
Sahar Rajabi, Nayeema Nonta, Sirisha Rambhatla

TL;DR
SubTrack++ introduces a novel gradient subspace tracking method using Grassmannian geometry and recovery scaling, significantly reducing training time for large language models without increasing memory use.
Contribution
It presents a new approach combining Grassmannian subspace tracking with projection-aware optimizers and recovery scaling to improve LLM training efficiency.
Findings
Achieves up to 65% reduction in pre-training wall-time.
Reduces fine-tuning time by 36%.
Maintains the same memory footprint as existing methods.
Abstract
Training large language models (LLMs) is highly resource-intensive due to their massive number of parameters and the overhead of optimizer states. While recent work has aimed to reduce memory consumption, such efforts often entail trade-offs among memory efficiency, training time, and model performance. Yet, true democratization of LLMs requires simultaneous progress across all three dimensions. To this end, we propose SubTrack++ that leverages Grassmannian gradient subspace tracking combined with projection-aware optimizers, enabling Adam's internal statistics to adapt to subspace changes. Additionally, employing recovery scaling, a technique that restores information lost through low-rank projections, further enhances model performance. Our method demonstrates SOTA convergence by exploiting Grassmannian geometry, reducing pre-training wall-time by up to 65% and fine-tuning time by 36%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTarget Tracking and Data Fusion in Sensor Networks · Neural Networks and Applications · EEG and Brain-Computer Interfaces
