On the Duality between Gradient Transformations and Adapters
Lucas Torroba-Hennigen, Hunter Lang, Han Guo, Yoon Kim

TL;DR
This paper reveals a duality between linear gradient transformations and adapter-based reparameterizations in neural network training, unifying memory-efficient methods and proposing new avenues for optimizing training efficiency.
Contribution
It establishes a theoretical equivalence between gradient transformations and adapter reparameterizations, unifying existing memory-efficient training approaches and guiding future method development.
Findings
Linear gradient transformations are equivalent to adapter reparameterizations.
Kronecker-factored transformations relate to LoRA adapters.
The duality unifies and extends memory-efficient training techniques.
Abstract
We study memory-efficient optimization of neural networks (in particular language models) with linear gradient transformations, where the gradients are linearly mapped to a lower dimensional space than the full parameter space, thus saving memory required for gradient accumulation and optimizer state persistence. The model parameters are updated by first performing an optimization step in the lower dimensional space and then going back into the original parameter space via the linear map's transpose. We show that optimizing the model in this transformed space is equivalent to reparameterizing the original model through a linear adapter that additively modifies the model parameters, and then only optimizing the adapter's parameters. When the transformation is Kronecker-factored, this establishes an equivalence between GaLore and one-sided LoRA. We show that this duality between gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMatrix Theory and Algorithms · Advanced Differential Equations and Dynamical Systems · Mathematics and Applications
MethodsAdapter
