TL;DR
This paper introduces a geometric framework for momentum-based optimizers tailored for low-rank neural network training, addressing convergence issues and improving efficiency by incorporating the intrinsic geometry of the parameter space.
Contribution
It proposes novel momentum-based optimization strategies derived from dynamical low-rank approximation that better respect the geometric structure of low-rank training landscapes.
Findings
Faster convergence in low-rank training scenarios
Stronger validation metrics at fixed parameter budgets
Classical momentum methods may struggle with low-rank geometries
Abstract
Low-rank pre-training and fine-tuning have recently emerged as promising techniques for reducing the computational and storage costs of large neural networks. Training low-rank parameterizations typically relies on conventional optimizers such as heavy ball momentum methods or Adam. In this work, we identify and analyze potential difficulties that these training methods encounter when used to train low-rank parameterizations of weights. In particular, we show that classical momentum methods can struggle to converge to a local optimum due to the geometry of the underlying optimization landscape. To address this, we introduce novel training strategies derived from dynamical low-rank approximation, which explicitly account for the underlying geometric structure. Our approach leverages and combines tools from dynamical low-rank approximation and momentum-based optimization to design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
