GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning
Yeonjoon Jung, Daehyun Ahn, Hyungjun Kim, Taesu Kim, and Eunhyeok Park

TL;DR
GraLoRA introduces a novel partitioned low-rank adaptation method that enhances parameter-efficient fine-tuning by overcoming LoRA's structural limitations, leading to significant performance improvements across various tasks.
Contribution
The paper proposes GraLoRA, a new structure that partitions weight matrices into sub-blocks with individual adapters, effectively increasing capacity and accuracy in PEFT.
Findings
Outperforms LoRA and baselines on code and reasoning benchmarks
Achieves up to +8.5% Pass@1 gain on HumanEval+
Maintains scalability and robustness across model sizes and ranks
Abstract
Low-Rank Adaptation (LoRA) is a popular method for parameter-efficient fine-tuning (PEFT) of generative models, valued for its simplicity and effectiveness. Despite recent enhancements, LoRA still suffers from a fundamental limitation: overfitting when the bottleneck is widened. It performs best at ranks 32-64, yet its accuracy stagnates or declines at higher ranks, still falling short of full fine-tuning (FFT) performance. We identify the root cause as LoRA's structural bottleneck, which introduces gradient entanglement to the unrelated input channels and distorts gradient propagation. To address this, we introduce a novel structure, Granular Low-Rank Adaptation (GraLoRA) that partitions weight matrices into sub-blocks, each with its own low-rank adapter. With negligible computational or storage cost, GraLoRA overcomes LoRA's limitations, effectively increases the representational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech and Audio Processing
