Breaking the Frozen Subspace: Importance Sampling for Low-Rank Optimization in LLM Pretraining
Haochen Zhang, Junze Yin, Guanchu Wang, Zirui Liu, Lin F. Yang, Tianyi Zhang, Anshumali Shrivastava, Vladimir Braverman

TL;DR
This paper introduces an importance sampling method for low-rank optimization in large language model pretraining, overcoming limitations of dominant subspace approaches and achieving better empirical results.
Contribution
It proposes a novel importance sampling technique with convergence guarantees for low-rank optimization in LLM pretraining, improving over existing dominant subspace methods.
Findings
Significantly outperforms previous low-rank methods in LLM pretraining tasks.
Provides a provable convergence guarantee for the proposed importance sampling approach.
Addresses the issue of dominant subspace stagnation during pretraining.
Abstract
Low-rank optimization has emerged as a promising approach to enabling memory-efficient training of large language models (LLMs). Existing low-rank optimization methods typically project gradients onto a low-rank subspace, reducing the memory cost of storing optimizer states. A key challenge in these methods is selecting suitable subspaces to ensure an effective optimization trajectory. Most existing approaches select the dominant subspace to preserve gradient information, as this intuitively provides the best approximation. However, we find that in practice, the dominant subspace stops changing during pretraining, thereby constraining weight updates to similar subspaces. In this paper, we propose importance sampling for low-rank optimization in LLM pretraining with a provable convergence guarantee, which the dominant subspace approach does not have. Empirically, we demonstrate that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMineral Processing and Grinding · Advanced machining processes and optimization · Advanced Surface Polishing Techniques
