TL;DR
RefLoRA enhances low-rank adaptation for large models by optimizing rank factorization, leading to faster convergence, improved performance, and minimal additional computational cost.
Contribution
This paper introduces RefLoRA, a novel method that optimizes low-rank factorization per step to improve convergence and performance of LoRA-based fine-tuning.
Findings
RefLoRA converges faster than standard LoRA.
RefLoRA outperforms existing benchmarks in accuracy.
RefLoRA has negligible additional computational overhead.
Abstract
Low-Rank Adaptation (LoRA) lowers the computational and memory overhead of fine-tuning large models by updating a low-dimensional subspace of the pre-trained weight matrix. Albeit efficient, LoRA exhibits suboptimal convergence and noticeable performance degradation, due to inconsistent and imbalanced weight updates induced by its nonunique low-rank factorizations. To overcome these limitations, this article identifies the optimal low-rank factorization per step that minimizes an upper bound on the loss. The resultant refactored low-rank adaptation (RefLoRA) method promotes a flatter loss landscape, along with consistent and balanced weight updates, thus speeding up stable convergence. Extensive experiments evaluate RefLoRA on natural language understanding, and commonsense reasoning tasks with popular large language models including DeBERTaV3, LLaMA-7B, LLaMA2-7B and LLaMA3-8B. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
