AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation
Ziyun Liu, Fengmiao Bian, Jian-Feng Cai

TL;DR
AdaPreLoRA introduces a novel optimizer for Low-Rank Adaptation that improves efficiency and performance across various models by leveraging gradient-statistics-aware preconditioning.
Contribution
It proposes AdaPreLoRA, a new optimizer that uses Adafactor preconditioning and a closed-form factor-space solution, filling a gap in existing LoRA optimization methods.
Findings
AdaPreLoRA is competitive with or better than existing LoRA optimizers.
It maintains peak GPU memory at the level of standard LoRA.
Demonstrates effectiveness across GPT-2, Mistral-7B, Qwen2-7B, and diffusion models.
Abstract
Low-Rank Adaptation (LoRA) reparameterizes a weight update as a product of two low-rank factors, but the Jacobian of the generator mapping the factors to the weight matrix is rank-deficient, so the factor-space preconditioner induced by any -space preconditioner is singular, and consequently the standard chain rule cannot be uniquely inverted to map a preconditioned -space direction back to a factor-space update. We cast existing LoRA optimizers in a unified framework parameterized by two choices: (i) which invertible surrogate for to use, and (ii) which on to use. Existing methods occupy four families along these axes: factor-space adaptive updates, block-diagonal surrogates for , Frobenius-residual pseudoinverse methods, and Riemannian manifold constraint. Within this design space, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
