Learning Rate Scaling across LoRA Ranks and Transfer to Full Finetuning

Nan Chen; Soledad Villar; Soufiane Hayou

arXiv:2602.06204·cs.LG·February 9, 2026

Learning Rate Scaling across LoRA Ranks and Transfer to Full Finetuning

Nan Chen, Soledad Villar, Soufiane Hayou

PDF

Open Access

TL;DR

This paper develops a theoretical framework to understand how the optimal learning rate scales with LoRA adapter rank and transferability to full finetuning, reducing tuning costs across various tasks.

Contribution

Introduces Maximal-Update Adaptation ($$), a theoretical model for learning rate scaling in LoRA, enabling transfer from LoRA to full finetuning and simplifying hyperparameter tuning.

Findings

01

Optimal learning rate scaling depends on initialization and scaling factors.

02

Two regimes identified: invariant and inversely proportional to rank.

03

Learning rate transfer from LoRA to full finetuning is effective across tasks.

Abstract

Low-Rank Adaptation (LoRA) is a standard tool for parameter-efficient finetuning of large models. While it induces a small memory footprint, its training dynamics can be surprisingly complex as they depend on several hyperparameters such as initialization, adapter rank, and learning rate. In particular, it is unclear how the optimal learning rate scales with adapter rank, which forces practitioners to re-tune the learning rate whenever the rank is changed. In this paper, we introduce Maximal-Update Adaptation ( $μ$ A), a theoretical framework that characterizes how the "optimal" learning rate should scale with model width and adapter rank to produce stable, non-vanishing feature updates under standard configurations. $μ$ A is inspired from the Maximal-Update Parametrization ( $μ$ P) in pretraining. Our analysis leverages techniques from hyperparameter transfer and reveals that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications