TL;DR
This paper demonstrates that with proper learning rate tuning, vanilla LoRA performs comparably to more complex variants across various tasks and model scales, emphasizing the importance of hyperparameter optimization.
Contribution
It systematically re-evaluates nine LoRA variants and vanilla LoRA, revealing that hyperparameter tuning, especially learning rate, is crucial for fair comparison and performance.
Findings
All LoRA methods achieve similar peak performance when learning rates are properly tuned.
Different LoRA variants favor distinct learning rate ranges.
Vanilla LoRA remains a competitive baseline after hyperparameter tuning.
Abstract
Low-Rank Adaptation (LoRA) is the prevailing approach for efficient large language model (LLM) fine-tuning. Building on this paradigm, recent studies have proposed alternative initialization strategies, architectural modifications, and optimization adjustments, reporting substantial improvements over vanilla LoRA. However, these gains are often demonstrated under fixed or narrowly tuned hyperparameter settings, despite the known sensitivity of neural networks to training configurations. In this work, we systematically re-evaluate nine representative LoRA variants alongside vanilla LoRA through extensive hyperparameter searches over learning rate, batch size, rank, and training duration. Across tasks spanning mathematical reasoning, commonsense reasoning, code generation, and instruction following at diverse model scales, we find that different LoRA methods favor distinct learning rate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
