Learning Rate Scaling across LoRA Ranks and Transfer to Full Finetuning
Nan Chen, Soledad Villar, Soufiane Hayou

TL;DR
This paper develops a theoretical framework to understand how the optimal learning rate scales with LoRA adapter rank and transferability to full finetuning, reducing tuning costs across various tasks.
Contribution
Introduces Maximal-Update Adaptation ($$), a theoretical model for learning rate scaling in LoRA, enabling transfer from LoRA to full finetuning and simplifying hyperparameter tuning.
Findings
Optimal learning rate scaling depends on initialization and scaling factors.
Two regimes identified: invariant and inversely proportional to rank.
Learning rate transfer from LoRA to full finetuning is effective across tasks.
Abstract
Low-Rank Adaptation (LoRA) is a standard tool for parameter-efficient finetuning of large models. While it induces a small memory footprint, its training dynamics can be surprisingly complex as they depend on several hyperparameters such as initialization, adapter rank, and learning rate. In particular, it is unclear how the optimal learning rate scales with adapter rank, which forces practitioners to re-tune the learning rate whenever the rank is changed. In this paper, we introduce Maximal-Update Adaptation (A), a theoretical framework that characterizes how the "optimal" learning rate should scale with model width and adapter rank to produce stable, non-vanishing feature updates under standard configurations. A is inspired from the Maximal-Update Parametrization (P) in pretraining. Our analysis leverages techniques from hyperparameter transfer and reveals that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
