Lean and Mean Adaptive Optimization via Subset-Norm and Subspace-Momentum with Convergence Guarantees
Thien Hang Nguyen, Huy Le Nguyen

TL;DR
This paper presents two novel optimization techniques, Subset-Norm and Subspace-Momentum, that significantly reduce memory usage and accelerate training of large neural networks with proven convergence guarantees.
Contribution
The paper introduces Subset-Norm and Subspace-Momentum, new methods that lower memory requirements and improve training efficiency for large-scale neural networks with theoretical convergence proofs.
Findings
Subset-Norm reduces AdaGrad's memory from O(d) to O(√d).
Combining methods achieves similar validation perplexity with less memory.
Empirical results show over 80% memory reduction with minimal tuning.
Abstract
We introduce two complementary techniques for efficient optimization that reduce memory requirements while accelerating training of large-scale neural networks. The first technique, Subset-Norm step size, generalizes AdaGrad-Norm and AdaGrad(-Coordinate) through step-size sharing. Subset-Norm (SN) reduces AdaGrad's memory footprint from to , where is the model size. For non-convex smooth objectives under coordinate-wise sub-gaussian noise, we show a noise-adapted high-probability convergence guarantee with improved dimensional dependence of SN over existing methods. Our second technique, Subspace-Momentum, reduces the momentum state's memory footprint by restricting momentum to a low-dimensional subspace while performing SGD in the orthogonal complement. We prove a high-probability convergence result for Subspace-Momentum under standard assumptions. Empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques
MethodsLLaMA · Stochastic Gradient Descent
