The Quest for Winning Tickets in Low-Rank Adapters
Hamed Damirchi, Cristian Rodriguez-Opazo, Ehsan Abbasnejad, Zhen Zhang, Javen Shi

TL;DR
This paper extends the Lottery Ticket Hypothesis to Low-Rank Adaptation (LoRA) in parameter-efficient fine-tuning, revealing sparse subnetworks that match dense adapter performance and proposing Partial-LoRA for efficient training.
Contribution
It demonstrates that LTH applies to LoRA, introduces Partial-LoRA to identify sparse subnetworks, and achieves significant parameter reduction with maintained or improved accuracy.
Findings
LTH holds within LoRA, enabling sparse subnetworks to match dense adapter performance.
Effectiveness depends more on sparsity level than specific weights.
Partial-LoRA reduces trainable parameters by up to 87% while maintaining or improving accuracy.
Abstract
The Lottery Ticket Hypothesis (LTH) suggests that over-parameterized neural networks contain sparse subnetworks ("winning tickets") capable of matching full model performance when trained from scratch. With the growing reliance on fine-tuning large pretrained models, we investigate whether LTH extends to parameter-efficient fine-tuning (PEFT), specifically focusing on Low-Rank Adaptation (LoRA) methods. Our key finding is that LTH holds within LoRAs, revealing sparse subnetworks that can match the performance of dense adapters. In particular, we find that the effectiveness of sparse subnetworks depends more on how much sparsity is applied in each layer than on the exact weights included in the subnetwork. Building on this insight, we propose Partial-LoRA, a method that systematically identifies said subnetworks and trains sparse low-rank adapters aligned with task-relevant subspaces of…
Peer Reviews
Decision·Submitted to ICLR 2025
1. This paper is well-written and interesting. It introduces the concept of winning tickets to low-rank adapters in a novel way and is backed by theoretical grounding. 2. From an empirical perspective, the method of masking a proportion of parameters offers more flexibility compared to existing approaches and shows strong performance across multiple experiments on vision and language models.
1. The datasets chosen by the authors are relatively simple, and the model used is comparatively small. In such tasks, which inherently may require fewer parameters[1] to learn effectively, the performance gap between full fine-tuning and PEFT methods, including LoRA, tends to be minimal. In more challenging tasks where LoRA underperforms compared to full fine-tuning possibly due to capacity limitations[2], it is unclear whether Partial-LoRA would maintain its advantage. 2. A discussion on the i
The authors address an interesting problem and draw inspiration from previous works. The paper is well-organized and structured.
Please refer to the questions I listed.
1. The paper proposes a novel, interesting method that extends the lottery ticket hypothesis to low-rank adaptation. 2. The authors provide a good theoretical foundation to justify the LTH concept within LoRA. 3. The experiments cover a wide range of vision and language tasks, providing solid results in terms of maintaining accuracy while reducing the total number of parameters.
1. I think there is a main weakness in the presentation of the results: while the plots are nice, there should be a table showing the number of parameters and accuracy. Additionally, it should compare against methods of close parameter count if feasible (e.g., current results show how the method can maintain the accuracy while using few param counts, there should be a comparison the other way around, too, showing how methods with similar parameter count cannot achieve the same result to justify
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
