Dynamic Low-Rank Sparse Adaptation for Large Language Models
Weizhong Huang, Yuxin Zhang, Xiawu Zheng, Yang Liu, Jing Lin, Yiwu, Yao, Rongrong Ji

TL;DR
This paper introduces LoSA, a dynamic low-rank sparse adaptation method that improves sparse LLM performance during fine-tuning without increasing inference latency, by adaptively sparsifying and adjusting the LoRA modules.
Contribution
LoSA seamlessly integrates low-rank adaptation into sparse LLMs, dynamically determining layer importance and rank adjustments to enhance performance efficiently.
Findings
Reduces perplexity of sparse LLaMA-2-7B by 68.73
Increases zero-shot accuracy by 16.32%
Achieves 2.60× speedup on CPU and 2.23× on GPU
Abstract
Despite the efficacy of network sparsity in alleviating the deployment strain of Large Language Models (LLMs), it endures significant performance degradation. Applying Low-Rank Adaptation (LoRA) to fine-tune the sparse LLMs offers an intuitive approach to counter this predicament, while it holds shortcomings include: 1) The inability to integrate LoRA weights into sparse LLMs post-training, and 2) Insufficient performance recovery at high sparsity ratios. In this paper, we introduce dynamic Low-rank Sparse Adaptation (LoSA), a novel method that seamlessly integrates low-rank adaptation into LLM sparsity within a unified framework, thereby enhancing the performance of sparse LLMs without increasing the inference latency. In particular, LoSA dynamically sparsifies the LoRA outcomes based on the corresponding sparse weights during fine-tuning, thus guaranteeing that the LoRA module can be…
Peer Reviews
Decision·ICLR 2025 Poster
* Addresses a relevant problem: Performance degradation in sparse LLMs is a known issue, and LoSA offers a practical solution. * Novelty: Integrating sparsification into the low-rank adaptation process and dynamically adjusting sparsity/rank based on RMI and reconstruction errors are novel ideas. * Strong empirical results: The experimental results show consistent improvements across various LLMs and sparsity levels. * Inference efficiency: LoSA preserves the inference speed advantages of spa
na
1. reasonable motivation, studying the sparsification of LLM while applying LoRA. 2. clear problem formulation and related work introduction at each subproblem. 3. detailed and summarized pseudocode for connecting each step and explaining the overall algorithm. 4. strong and promising experimental results on LLMs regarding both model performance and speedup.
1. the algorithm consists of many heuristics and is lack of step by step derivation, e.g., Eq. 7 and Eq. 9 2. some experiment details are missing and unclear.
1. LoSA introduces a combined dynamic sparsity and rank adjustment mechanism for fine-tuning sparse LLMs. Using RMI for layer-wise sparsity rate determination and reconstruction errors for rank allocation seems a reasonable approach for preserving model performance under sparse conditions. Moreover, trying to match the sparsity pattern of the adaptation path BA and the pre-trained weight W is novel. 2. Comprehensive Empirical Evaluation: The paper’s experiments cover multiple architectures
1. The paper lacks comparisons with adaptive LoRA methods like AdaLoRA[1] and SoRA[2], which are critical for evaluating LoSA’s performance among recent dynamic rank approaches. Without these comparisons, LoSA’s relative advantage remains unclear. 2. The optimization setup in Eq. 5 (Section 2.2) assigns higher sparsity rates to layers with higher importance, which contradicts standard practices that seek to preserve the most important layers. This questionable logic may weaken the model’s repr
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
